1. 13 11月, 2017 40 次提交
    • D
      afs: Overhaul the callback handling · c435ee34
      David Howells 提交于
      Overhaul the AFS callback handling by the following means:
      
       (1) Don't give up callback promises on vnodes that we are no longer using,
           rather let them just expire on the server or let the server break
           them.  This is actually more efficient for the server as the callback
           lookup is expensive if there are lots of extant callbacks.
      
       (2) Only give up the callback promises we have from a server when the
           server record is destroyed.  Then we can just give up *all* the
           callback promises on it in one go.
      
       (3) Servers can end up being shared between cells if cells are aliased, so
           don't add all the vnodes being backed by a particular server into a
           big FID-indexed tree on that server as there may be duplicates.
      
           Instead have each volume instance (~= superblock) register an interest
           in a server as it starts to make use of it and use this to allow the
           processor for callbacks from the server to find the superblock and
           thence the inode corresponding to the FID being broken by means of
           ilookup_nowait().
      
       (4) Rather than iterating over the entire callback list when a mass-break
           comes in from the server, maintain a counter of mass-breaks in
           afs_server (cb_seq) and make afs_validate() check it against the copy
           in afs_vnode.
      
           It would be nice not to have to take a read_lock whilst doing this,
           but that's tricky without using RCU.
      
       (5) Save a ref on the fileserver we're using for a call in the afs_call
           struct so that we can access its cb_s_break during call decoding.
      
       (6) Write-lock around callback and status storage in a vnode and read-lock
           around getattr so that we don't see the status mid-update.
      
      This has the following consequences:
      
       (1) Data invalidation isn't seen until someone calls afs_validate() on a
           vnode.  Unfortunately, we need to use a key to query the server, but
           getting one from a background thread is tricky without caching loads
           of keys all over the place.
      
       (2) Mass invalidation isn't seen until someone calls afs_validate().
      
       (3) Callback breaking is going to hit the inode_hash_lock quite a bit.
           Could this be replaced with rcu_read_lock() since inodes are destroyed
           under RCU conditions.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      c435ee34
    • D
      afs: Rename struct afs_call server member to cm_server · d0676a16
      David Howells 提交于
      Rename the server member of struct afs_call to cm_server as we're only
      going to be using it for incoming calls for the Cache Manager service.
      This makes it easier to differentiate from the pointer to the target server
      for the client, which will point to a different structure to allow for
      callback handling.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      d0676a16
    • D
      afs: Fix the afs_uuid struct to make the char-sized fields signed · 03dc2cfc
      David Howells 提交于
      In AFS's encoding of a UUID, the eight 'char' fields are all signed, so
      represent them with __s8 rather than __u8.  This makes the compiler
      sign-extend them correctly when XDR-encoding them.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      03dc2cfc
    • D
      afs: Connect up the CB.ProbeUuid · f4b3526d
      David Howells 提交于
      The handler for the CB.ProbeUuid operation in the cache manager is
      implemented, but isn't listed in the switch-statement of operation
      selection, so won't be used.  Fix this by adding it.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      f4b3526d
    • D
      afs: Potentially return call->reply[0] from afs_make_call() · 33cd7f2b
      David Howells 提交于
      If call->ret_reply0 is set, return call->reply[0] on success.  Change the
      return type of afs_make_call() to long so that this can be passed back
      without bit loss and then cast to a pointer if required.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      33cd7f2b
    • D
      afs: Condense afs_call's reply{,2,3,4} into an array · 97e3043a
      David Howells 提交于
      Condense struct afs_call's reply anchor members - reply{,2,3,4} - into an
      array.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      97e3043a
    • D
      afs: Consolidate abort_to_error translators · f780c8ea
      David Howells 提交于
      The AFS abort code space is shared across all services, so there's no need
      for separate abort_to_error translators for each service.
      
      Consolidate them into a single function and remove the function pointers
      for them.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      f780c8ea
    • D
      afs: Allow IPv6 address specification of VL servers · 3838d3ec
      David Howells 提交于
      Allow VL server specifications to be given IPv6 addresses as well as IPv4
      addresses, for example as:
      
      	echo add foo.org 1111:2222:3333:0:4444:5555:6666:7777 >/proc/fs/afs/cells
      
      Note that ':' is the expected separator for separating IPv4 addresses, but
      if a ',' is detected or no '.' is detected in the string, the delimiter is
      switched to ','.
      
      This also works with DNS AFSDB or SRV record strings fetched by upcall from
      userspace.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      3838d3ec
    • D
      afs: Keep and pass sockaddr_rxrpc addresses rather than in_addr · 4d9df986
      David Howells 提交于
      Keep and pass sockaddr_rxrpc addresses around rather than keeping and
      passing in_addr addresses to allow for the use of IPv6 and non-standard
      port numbers in future.
      
      This also allows the port and service_id fields to be removed from the
      afs_call struct.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      4d9df986
    • D
      afs: Update the cache index structure · ad6a942a
      David Howells 提交于
      Update the cache index structure in the following ways:
      
       (1) Don't use the volume name followed by the volume type as levels in the
           cache index.  Volumes can be renamed.  Use the volume ID instead.
      
       (2) Don't store the VLDB data for a volume in the tree.  If the volume
           database should be cached locally, then it should be done in a separate
           tree.
      
       (3) Expand the volume ID stored in the cache to 64 bits.
      
       (4) Expand the file/vnode ID stored in the cache to 96 bits.
      
       (5) Increment the cache structure version number to 1.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      ad6a942a
    • D
      afs: Add some protocol defs · 91a90380
      David Howells 提交于
      Add some protocol definitions, including max field lengths, flag defs, an
      XDR-encoded UUID def, more VL operation IDs and more fileserver abort
      codes.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      91a90380
    • D
      afs: Push the net ns pointer to more places · 9ed900b1
      David Howells 提交于
      Push the network namespace pointer to more places in AFS, including the
      afs_server structure (which doesn't hold a ref on the netns).
      
      In particular, afs_put_cell() now takes requires a net ns parameter so that
      it can safely alter the netns after decrementing the cell usage count - the
      cell will be deallocated by a background thread after being cached for a
      period, which means that it's not safe to access it after reducing its
      usage count.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      9ed900b1
    • D
      afs: Note the cell in the superblock info also · 49566f6f
      David Howells 提交于
      Keep a reference to the cell in the superblock info structure in addition
      to the volume and net pointers.  This will make it easier to clean up in a
      future patch in which afs_put_volume() will need the cell pointer.
      
      Whilst we're at it, make the cell and volume getting functions return a
      pointer to the object got to make the call sites look neater.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      49566f6f
    • D
      afs: Fix server reaping · 59fa1c4a
      David Howells 提交于
      Fix server reaping and make sure it's all done before we start trying to
      purge cells, given that servers currently pin cells.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      59fa1c4a
    • D
      afs: Close the rxrpc socket only after purging the servers · e3b2ffe0
      David Howells 提交于
      Close the rxrpc socket only after we've purged the server records (and also
      cell and volume records which might refer to servers) so that we can give
      up the callbacks on each server.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      e3b2ffe0
    • D
      afs: Lay the groundwork for supporting network namespaces · f044c884
      David Howells 提交于
      Lay the groundwork for supporting network namespaces (netns) to the AFS
      filesystem by moving various global features to a network-namespace struct
      (afs_net) and providing an instance of this as a temporary global variable
      that everything uses via accessor functions for the moment.
      
      The following changes have been made:
      
       (1) Store the netns in the superblock info.  This will be obtained from
           the mounter's nsproxy on a manual mount and inherited from the parent
           superblock on an automount.
      
       (2) The cell list is made per-netns.  It can be viewed through
           /proc/net/afs/cells and also be modified by writing commands to that
           file.
      
       (3) The local workstation cell is set per-ns in /proc/net/afs/rootcell.
           This is unset by default.
      
       (4) The 'rootcell' module parameter, which sets a cell and VL server list
           modifies the init net namespace, thereby allowing an AFS root fs to be
           theoretically used.
      
       (5) The volume location lists and the file lock manager are made
           per-netns.
      
       (6) The AF_RXRPC socket and associated I/O bits are made per-ns.
      
      The various workqueues remain global for the moment.
      
      Changes still to be made:
      
       (1) /proc/fs/afs/ should be moved to /proc/net/afs/ and a symlink emplaced
           from the old name.
      
       (2) A per-netns subsys needs to be registered for AFS into which it can
           store its per-netns data.
      
       (3) Rather than the AF_RXRPC socket being opened on module init, it needs
           to be opened on the creation of a superblock in that netns.
      
       (4) The socket needs to be closed when the last superblock using it is
           destroyed and all outstanding client calls on it have been completed.
           This prevents a reference loop on the namespace.
      
       (5) It is possible that several namespaces will want to use AFS, in which
           case each one will need its own UDP port.  These can either be set
           through /proc/net/afs/cm_port or the kernel can pick one at random.
           The init_ns gets 7001 by default.
      
      Other issues that need resolving:
      
       (1) The DNS keyring needs net-namespacing.
      
       (2) Where do upcalls go (eg. DNS request-key upcall)?
      
       (3) Need something like open_socket_in_file_ns() syscall so that AFS
           command line tools attempting to operate on an AFS file/volume have
           their RPC calls go to the right place.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      f044c884
    • D
      Pass mode to wait_on_atomic_t() action funcs and provide default actions · 5e4def20
      David Howells 提交于
      Make wait_on_atomic_t() pass the TASK_* mode onto its action function as an
      extra argument and make it 'unsigned int throughout.
      
      Also, consolidate a bunch of identical action functions into a default
      function that can do the appropriate thing for the mode.
      
      Also, change the argument name in the bit_wait*() function declarations to
      reflect the fact that it's the mode and not the bit number.
      
      [Peter Z gives this a grudging ACK, but thinks that the whole atomic_t wait
      should be done differently, though he's not immediately sure as to how]
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      cc: Ingo Molnar <mingo@kernel.org>
      5e4def20
    • D
      Merge remote-tracking branch 'tip/timers/core' into afs-next · 81445e63
      David Howells 提交于
      These AFS patches need the timer_reduce() patch from timers/core.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      81445e63
    • D
      Merge branch 'net-improve-the-process-of-redirect-and-toobig-for-ipv6-tunnels' · ede372dc
      David S. Miller 提交于
      Xin Long says:
      
      ====================
      net: improve the process of redirect and toobig for ipv6 tunnels
      
      Now let's say there are 3 kinds of icmp packets to process for tunnels,
      toobig(needfrag), redirect, others, their process should be:
      
       - toobig(needfrag)
         update the lower dst's pmtu by route cache, also update sk dst's pmtu
         if possible, or it will be fine if sk dst pmtu will get updated on tx
         path.
      
       - redirect
         update the lower dst's gw by route cache and return, no need to send
         this redirect packet to user sk.
      
       - others
         send the packet to user's sk, or it will also be fine to use err_count
         to count it and report fail link on tx path.
      
      All ipv4 tunnels basically follow this while some of ipv6 tunnels are
      doing in different ways, like ip6gre and ip6_tunnels update tnl dev's
      mtu instead of updating lower dst pmtu, no redirect process on their
      err_handlers, which doesn't make any sense and even causes performance
      problems.
      
      This patchset is to improve the process of redirect and toobig for ip6gre
      ip4ip6, ip6ip6 tunnels, as in ipv4 tunnels.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ede372dc
    • X
      ip6_tunnel: clean up ip4ip6 and ip6ip6's err_handlers · 77552cfa
      Xin Long 提交于
      This patch is to remove some useless codes of redirect and fix some
      indents on ip4ip6 and ip6ip6's err_handlers.
      
      Note that redirect icmp packet is already processed in ip6_tnl_err,
      the old redirect codes in ip4ip6_err actually never worked even
      before this patch. Besides, there's no need to send redirect to
      user's sk, it's for lower dst, so just remove it in this patch.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      77552cfa
    • X
      ip6_tunnel: process toobig in a better way · b00f5432
      Xin Long 提交于
      The same improvement in "ip6_gre: process toobig in a better way"
      is needed by ip4ip6 and ip6ip6 as well.
      
      Note that ip4ip6 and ip6ip6 will also update sk dst pmtu in their
      err_handlers. Like I said before, gre6 could not do this as it's
      inner proto is not certain. But for all of them, sk dst pmtu will
      be updated in tx path if in need.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b00f5432
    • X
      ip6_tunnel: add the process for redirect in ip6_tnl_err · 383c1f88
      Xin Long 提交于
      The same process for redirect in "ip6_gre: add the process for redirect
      in ip6gre_err" is needed by ip4ip6 and ip6ip6 as well.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      383c1f88
    • X
      ip6_gre: process toobig in a better way · fe1a4ca0
      Xin Long 提交于
      Now ip6gre processes toobig icmp packet by setting gre dev's mtu in
      ip6gre_err, which would cause few things not good:
      
        - It couldn't set mtu with dev_set_mtu due to it's not in user context,
          which causes route cache and idev->cnf.mtu6 not to be updated.
      
        - It has to update sk dst pmtu in tx path according to gredev->mtu for
          ip6gre, while it updates pmtu again according to lower dst pmtu in
          ip6_tnl_xmit.
      
        - To change dev->mtu by toobig icmp packet is not a good idea, it should
          only work on pmtu.
      
      This patch is to process toobig by updating the lower dst's pmtu, as later
      sk dst pmtu will be updated in ip6_tnl_xmit, the same way as in ip4gre.
      
      Note that gre dev's mtu will not be updated any more, it doesn't make any
      sense to change dev's mtu after receiving a toobig packet.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fe1a4ca0
    • X
      ip6_gre: add the process for redirect in ip6gre_err · 929fc032
      Xin Long 提交于
      This patch is to add redirect icmp packet process for ip6gre by
      calling ip6_redirect() in ip6gre_err(), as in vti6_err.
      
      Prior to this patch, there's even no route cache generated after
      receiving redirect.
      Reported-by: NJianlin Shi <jishi@redhat.com>
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      929fc032
    • Z
      forcedeth: remove redudant assignments in xmit · 0d728b84
      Zhu Yanjun 提交于
      In xmit process, the variables are set many times. In fact,
      it is enough for these variables to be set once.
      After a long time test, the throughput performance is better
      than before.
      
      CC: Srinivas Eeda <srinivas.eeda@oracle.com>
      CC: Joe Jin <joe.jin@oracle.com>
      CC: Junxiao Bi <junxiao.bi@oracle.com>
      Signed-off-by: NZhu Yanjun <yanjun.zhu@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0d728b84
    • D
      Merge tag 'nfc-next-4.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/nfc-next · 6afce196
      David S. Miller 提交于
      Samuel Ortiz says:
      
      ====================
      NFC 4.15 pull request
      
      This is the NFC pull request for 4.15. We have:
      
      - A new netlink command for explicitly deactivating NFC targets
      - i2c constification for all NFC drivers
      - One NFC device allocation error path fix
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6afce196
    • D
      Merge branch 'Openvswitch-meter-action' · fd9080a3
      David S. Miller 提交于
      Andy Zhou says:
      
      ====================
      Openvswitch meter action
      
      This patch series is the first attempt to add openvswitch
      meter support. We have previously experimented with adding
      metering support in nftables. However 1) It was not clear
      how to expose a named nftables object cleanly, and 2)
      the logic that implements metering is quite small, < 100 lines
      of code.
      
      With those two observations, it seems cleaner to add meter
      support in the openvswitch module directly.
      
      ---
      
          v1(RFC)->v2:  remove unused code improve locking
      		  and other review comments
          v2 -> v3:     rebase
          v3 -> v4:     fix undefined "__udivdi3" references on 32 bit builds.
                        use div_u64() instead.
          v4 -> v5:     rebase
      ====================
      Acked-by: NPravin B Shelar <pshelar@ovn.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fd9080a3
    • A
      openvswitch: Add meter action support · cd8a6c33
      Andy Zhou 提交于
      Implements OVS kernel meter action support.
      Signed-off-by: NAndy Zhou <azhou@ovn.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cd8a6c33
    • A
      openvswitch: Add meter infrastructure · 96fbc13d
      Andy Zhou 提交于
      OVS kernel datapath so far does not support Openflow meter action.
      This is the first stab at adding kernel datapath meter support.
      This implementation supports only drop band type.
      Signed-off-by: NAndy Zhou <azhou@ovn.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      96fbc13d
    • A
      openvswitch: export get_dp() API. · 9602c01e
      Andy Zhou 提交于
      Later patches will invoke get_dp() outside of datapath.c. Export it.
      Signed-off-by: NAndy Zhou <azhou@ovn.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9602c01e
    • A
      openvswitch: Add meter netlink definitions · 57940406
      Andy Zhou 提交于
      Meter has its own netlink family. Define netlink messages and attributes
      for communicating with the user space programs.
      Signed-off-by: NAndy Zhou <azhou@ovn.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      57940406
    • D
      Merge branch 'dsa-b53-Support-prepended-Broadcom-tags' · aef1e0d5
      David S. Miller 提交于
      Florian Fainelli says:
      
      ====================
      net: dsa: b53: Support prepended Broadcom tags
      
      This patch series adds support for prepended 4-bytes Broadcom tags that we
      already support. This type of tag will typically be used when interfaced to
      a SoC like BCM58xx (NorthStar Plus) which supports a Flow Accelerator (WIP).
      In that case, we need to support a slightly different tagging format.
      
      The first patch does a bit of re-factoring and passes a port index to
      the get_tag_protocol() function since at least two different drivers need
      that type of information (mt7530, b53) to support tagging or not.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      aef1e0d5
    • F
      net: dsa: b53: Support prepended Broadcom tags · 11606039
      Florian Fainelli 提交于
      On BCM58xx devices (Northstar Plus), there is an accelerator attached to
      port 8 which would only work if we use prepended Broadcom tags. Resolve
      that difference in our get_tag_protocol() function by setting the
      appropriate tagging protocol in that case. We need to change
      b53_brcm_hdr_setup() a little bit now since we can deal with two types
      of Broadcom tags.
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      11606039
    • F
      net: dsa: Support prepended Broadcom tag · b74b70c4
      Florian Fainelli 提交于
      Add a new type: DSA_TAG_PROTO_PREPEND which allows us to support for the
      4-bytes Broadcom tag that we already support, but in a format where it
      is pre-pended to the packet instead of located between the MAC SA and
      the Ethertyper (DSA_TAG_PROTO_BRCM).
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b74b70c4
    • F
      net: dsa: tag_brcm: Prepare for supporting prepended tag · f7c39e3d
      Florian Fainelli 提交于
      In preparation for supporting the same Broadcom tag format, but instead
      of inserted between the MAC SA and EtherType, prepended to the Ethernet
      frame, restructure the code a little bit to make that possible and take
      an offset parameter.
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f7c39e3d
    • F
      net: dsa: Pass a port to get_tag_protocol() · 5ed4e3eb
      Florian Fainelli 提交于
      A number of drivers want to check whether the configured CPU port is a
      possible configuration for enabling tagging, pass down the CPU port
      number so they verify that.
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Reviewed-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com>
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5ed4e3eb
    • A
      net/sched/sch_red.c: work around gcc-4.4.4 anon union initializer issue · ee9d3429
      Andrew Morton 提交于
      gcc-4.4.4 (at lest) has issues with initializers and anonymous unions:
      
      net/sched/sch_red.c: In function 'red_dump_offload':
      net/sched/sch_red.c:282: error: unknown field 'stats' specified in initializer
      net/sched/sch_red.c:282: warning: initialization makes integer from pointer without a cast
      net/sched/sch_red.c:283: error: unknown field 'stats' specified in initializer
      net/sched/sch_red.c:283: warning: initialization makes integer from pointer without a cast
      net/sched/sch_red.c: In function 'red_dump_stats':
      net/sched/sch_red.c:352: error: unknown field 'xstats' specified in initializer
      net/sched/sch_red.c:352: warning: initialization makes integer from pointer without a cast
      
      Work around this.
      
      Fixes: 602f3baf ("net_sch: red: Add offload ability to RED qdisc")
      Cc: Nogah Frankel <nogahf@mellanox.com>
      Cc: Jiri Pirko <jiri@mellanox.com>
      Cc: Simon Horman <simon.horman@netronome.com>
      Cc: David S. Miller <davem@davemloft.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ee9d3429
    • S
      net/mlx4: Use Kconfig flag to remove support of old gen2 Mellanox devices · a1b87145
      Slava Shwartsman 提交于
      Since Mellanox focus is on newer adapters, we would like to have the
      ability to disable the support for old gen2 adapters.
      
      This can be done by turning off the MLX4_CORE_GEN2 Kconfig flag.
      We keep it turned on by default.
      Signed-off-by: NSlava Shwartsman <slavash@mellanox.com>
      Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a1b87145
    • J
      af_netlink: ensure that NLMSG_DONE never fails in dumps · 0642840b
      Jason A. Donenfeld 提交于
      The way people generally use netlink_dump is that they fill in the skb
      as much as possible, breaking when nla_put returns an error. Then, they
      get called again and start filling out the next skb, and again, and so
      forth. The mechanism at work here is the ability for the iterative
      dumping function to detect when the skb is filled up and not fill it
      past the brim, waiting for a fresh skb for the rest of the data.
      
      However, if the attributes are small and nicely packed, it is possible
      that a dump callback function successfully fills in attributes until the
      skb is of size 4080 (libmnl's default page-sized receive buffer size).
      The dump function completes, satisfied, and then, if it happens to be
      that this is actually the last skb, and no further ones are to be sent,
      then netlink_dump will add on the NLMSG_DONE part:
      
        nlh = nlmsg_put_answer(skb, cb, NLMSG_DONE, sizeof(len), NLM_F_MULTI);
      
      It is very important that netlink_dump does this, of course. However, in
      this example, that call to nlmsg_put_answer will fail, because the
      previous filling by the dump function did not leave it enough room. And
      how could it possibly have done so? All of the nla_put variety of
      functions simply check to see if the skb has enough tailroom,
      independent of the context it is in.
      
      In order to keep the important assumptions of all netlink dump users, it
      is therefore important to give them an skb that has this end part of the
      tail already reserved, so that the call to nlmsg_put_answer does not
      fail. Otherwise, library authors are forced to find some bizarre sized
      receive buffer that has a large modulo relative to the common sizes of
      messages received, which is ugly and buggy.
      
      This patch thus saves the NLMSG_DONE for an additional message, for the
      case that things are dangerously close to the brim. This requires
      keeping track of the errno from ->dump() across calls.
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0642840b
    • D
      Merge branch 'netem-add-nsec-scheduling-and-slot-feature' · 907a4425
      David S. Miller 提交于
      Dave Taht says:
      
      ====================
      netem: add nsec scheduling and slot feature
      
      This patch series converts netem away from the old "ticks" interface and
      userspace API, and adds support for a new "slot" feature intended to
      emulate bursty macs such as WiFi and LTE better.
      
      Changes since v2:
      Use u64 for packet_len_sched_time()
      Use simpler max(time_to_send,q->slot.slot_next)
      
      Changes since v1:
      Always pass new nanosecond APIs to userspace
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      907a4425