1. 02 8月, 2017 4 次提交
  2. 30 7月, 2017 2 次提交
    • V
      net: ethtool: add support for forward error correction modes · 1a5f3da2
      Vidya Sagar Ravipati 提交于
      Forward Error Correction (FEC) modes i.e Base-R
      and Reed-Solomon modes are introduced in 25G/40G/100G standards
      for providing good BER at high speeds. Various networking devices
      which support 25G/40G/100G provides ability to manage supported FEC
      modes and the lack of FEC encoding control and reporting today is a
      source for interoperability issues for many vendors.
      FEC capability as well as specific FEC mode i.e. Base-R
      or RS modes can be requested or advertised through bits D44:47 of
      base link codeword.
      
      This patch set intends to provide option under ethtool to manage
      and report FEC encoding settings for networking devices as per
      IEEE 802.3 bj, bm and by specs.
      
      set-fec/show-fec option(s) are designed to provide control and
      report the FEC encoding on the link.
      
      SET FEC option:
      root@tor: ethtool --set-fec  swp1 encoding [off | RS | BaseR | auto]
      
      Encoding: Types of encoding
      Off    :  Turning off any encoding
      RS     :  enforcing RS-FEC encoding on supported speeds
      BaseR  :  enforcing Base R encoding on supported speeds
      Auto   :  IEEE defaults for the speed/medium combination
      
      Here are a few examples of what we would expect if encoding=auto:
      - if autoneg is on, we are  expecting FEC to be negotiated as on or off
        as long as protocol supports it
      - if the hardware is capable of detecting the FEC encoding on it's
            receiver it will reconfigure its encoder to match
      - in absence of the above, the configuration would be set to IEEE
        defaults.
      
      >From our  understanding , this is essentially what most hardware/driver
      combinations are doing today in the absence of a way for users to
      control the behavior.
      
      SHOW FEC option:
      root@tor: ethtool --show-fec  swp1
      FEC parameters for swp1:
      Active FEC encodings: RS
      Configured FEC encodings:  RS | BaseR
      
      ETHTOOL DEVNAME output modification:
      
      ethtool devname output:
      root@tor:~# ethtool swp1
      Settings for swp1:
      root@hpe-7712-03:~# ethtool swp18
      Settings for swp18:
          Supported ports: [ FIBRE ]
          Supported link modes:   40000baseCR4/Full
                                  40000baseSR4/Full
                                  40000baseLR4/Full
                                  100000baseSR4/Full
                                  100000baseCR4/Full
                                  100000baseLR4_ER4/Full
          Supported pause frame use: No
          Supports auto-negotiation: Yes
          Supported FEC modes: [RS | BaseR | None | Not reported]
          Advertised link modes:  Not reported
          Advertised pause frame use: No
          Advertised auto-negotiation: No
          Advertised FEC modes: [RS | BaseR | None | Not reported]
      <<<< One or more FEC modes
          Speed: 100000Mb/s
          Duplex: Full
          Port: FIBRE
          PHYAD: 106
          Transceiver: internal
          Auto-negotiation: off
          Link detected: yes
      
      This patch includes following changes
      a) New ETHTOOL_SFECPARAM/SFECPARAM API, handled by
        the new get_fecparam/set_fecparam callbacks, provides support
        for configuration of forward error correction modes.
      b) Link mode bits for FEC modes i.e. None (No FEC mode), RS, BaseR/FC
        are defined so that users can configure these fec modes for supported
        and advertising fields as part of link autonegotiation.
      Signed-off-by: NVidya Sagar Ravipati <vidya.chowdary@gmail.com>
      Signed-off-by: NDustin Byford <dustin@cumulusnetworks.com>
      Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1a5f3da2
    • W
      net: check dev->addr_len for dev_set_mac_address() · 0254e0c6
      WANG Cong 提交于
      Historically, dev_ifsioc() uses struct sockaddr as mac
      address definition, this is why dev_set_mac_address()
      accepts a struct sockaddr pointer as input but now we
      have various types of mac addresse whose lengths
      are up to MAX_ADDR_LEN, longer than struct sockaddr,
      and saved in dev->addr_len.
      
      It is too late to fix dev_ifsioc() due to API
      compatibility, so just reject those larger than
      sizeof(struct sockaddr), otherwise we would read
      and use some random bytes from kernel stack.
      
      Fortunately, only a few IPv6 tunnel devices have addr_len
      larger than sizeof(struct sockaddr) and they don't support
      ndo_set_mac_addr(). But with team driver, in lb mode, they
      can still be enslaved to a team master and make its mac addr
      length as the same.
      
      Cc: Jiri Pirko <jiri@resnulli.us>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0254e0c6
  3. 27 7月, 2017 1 次提交
  4. 25 7月, 2017 3 次提交
  5. 21 7月, 2017 1 次提交
  6. 20 7月, 2017 4 次提交
  7. 19 7月, 2017 1 次提交
  8. 18 7月, 2017 10 次提交
  9. 17 7月, 2017 1 次提交
  10. 14 7月, 2017 3 次提交
  11. 13 7月, 2017 2 次提交
    • M
      mm, tree wide: replace __GFP_REPEAT by __GFP_RETRY_MAYFAIL with more useful semantic · dcda9b04
      Michal Hocko 提交于
      __GFP_REPEAT was designed to allow retry-but-eventually-fail semantic to
      the page allocator.  This has been true but only for allocations
      requests larger than PAGE_ALLOC_COSTLY_ORDER.  It has been always
      ignored for smaller sizes.  This is a bit unfortunate because there is
      no way to express the same semantic for those requests and they are
      considered too important to fail so they might end up looping in the
      page allocator for ever, similarly to GFP_NOFAIL requests.
      
      Now that the whole tree has been cleaned up and accidental or misled
      usage of __GFP_REPEAT flag has been removed for !costly requests we can
      give the original flag a better name and more importantly a more useful
      semantic.  Let's rename it to __GFP_RETRY_MAYFAIL which tells the user
      that the allocator would try really hard but there is no promise of a
      success.  This will work independent of the order and overrides the
      default allocator behavior.  Page allocator users have several levels of
      guarantee vs.  cost options (take GFP_KERNEL as an example)
      
       - GFP_KERNEL & ~__GFP_RECLAIM - optimistic allocation without _any_
         attempt to free memory at all. The most light weight mode which even
         doesn't kick the background reclaim. Should be used carefully because
         it might deplete the memory and the next user might hit the more
         aggressive reclaim
      
       - GFP_KERNEL & ~__GFP_DIRECT_RECLAIM (or GFP_NOWAIT)- optimistic
         allocation without any attempt to free memory from the current
         context but can wake kswapd to reclaim memory if the zone is below
         the low watermark. Can be used from either atomic contexts or when
         the request is a performance optimization and there is another
         fallback for a slow path.
      
       - (GFP_KERNEL|__GFP_HIGH) & ~__GFP_DIRECT_RECLAIM (aka GFP_ATOMIC) -
         non sleeping allocation with an expensive fallback so it can access
         some portion of memory reserves. Usually used from interrupt/bh
         context with an expensive slow path fallback.
      
       - GFP_KERNEL - both background and direct reclaim are allowed and the
         _default_ page allocator behavior is used. That means that !costly
         allocation requests are basically nofail but there is no guarantee of
         that behavior so failures have to be checked properly by callers
         (e.g. OOM killer victim is allowed to fail currently).
      
       - GFP_KERNEL | __GFP_NORETRY - overrides the default allocator behavior
         and all allocation requests fail early rather than cause disruptive
         reclaim (one round of reclaim in this implementation). The OOM killer
         is not invoked.
      
       - GFP_KERNEL | __GFP_RETRY_MAYFAIL - overrides the default allocator
         behavior and all allocation requests try really hard. The request
         will fail if the reclaim cannot make any progress. The OOM killer
         won't be triggered.
      
       - GFP_KERNEL | __GFP_NOFAIL - overrides the default allocator behavior
         and all allocation requests will loop endlessly until they succeed.
         This might be really dangerous especially for larger orders.
      
      Existing users of __GFP_REPEAT are changed to __GFP_RETRY_MAYFAIL
      because they already had their semantic.  No new users are added.
      __alloc_pages_slowpath is changed to bail out for __GFP_RETRY_MAYFAIL if
      there is no progress and we have already passed the OOM point.
      
      This means that all the reclaim opportunities have been exhausted except
      the most disruptive one (the OOM killer) and a user defined fallback
      behavior is more sensible than keep retrying in the page allocator.
      
      [akpm@linux-foundation.org: fix arch/sparc/kernel/mdesc.c]
      [mhocko@suse.com: semantic fix]
        Link: http://lkml.kernel.org/r/20170626123847.GM11534@dhcp22.suse.cz
      [mhocko@kernel.org: address other thing spotted by Vlastimil]
        Link: http://lkml.kernel.org/r/20170626124233.GN11534@dhcp22.suse.cz
      Link: http://lkml.kernel.org/r/20170623085345.11304-3-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Alex Belits <alex.belits@cavium.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Darrick J. Wong <darrick.wong@oracle.com>
      Cc: David Daney <david.daney@cavium.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: NeilBrown <neilb@suse.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      dcda9b04
    • S
      datagram: fix kernel-doc comments · d3f6cd9e
      stephen hemminger 提交于
      An underscore in the kernel-doc comment section has special meaning
      and mis-use generates an errors.
      
      ./net/core/datagram.c:207: ERROR: Unknown target name: "msg".
      ./net/core/datagram.c:379: ERROR: Unknown target name: "msg".
      ./net/core/datagram.c:816: ERROR: Unknown target name: "t".
      Signed-off-by: NStephen Hemminger <sthemmin@microsoft.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d3f6cd9e
  12. 08 7月, 2017 1 次提交
    • W
      bonding: avoid NETDEV_CHANGEMTU event when unregistering slave · f51048c3
      WANG Cong 提交于
      As Hongjun/Nicolas summarized in their original patch:
      
      "
      When a device changes from one netns to another, it's first unregistered,
      then the netns reference is updated and the dev is registered in the new
      netns. Thus, when a slave moves to another netns, it is first
      unregistered. This triggers a NETDEV_UNREGISTER event which is caught by
      the bonding driver. The driver calls bond_release(), which calls
      dev_set_mtu() and thus triggers NETDEV_CHANGEMTU (the device is still in
      the old netns).
      "
      
      This is a very special case, because the device is being unregistered
      no one should still care about the NETDEV_CHANGEMTU event triggered
      at this point, we can avoid broadcasting this event on this path,
      and avoid touching inetdev_event()/addrconf_notify() path.
      
      It requires to export __dev_set_mtu() to bonding driver.
      Reported-by: NHongjun Li <hongjun.li@6wind.com>
      Reported-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Cc: Jay Vosburgh <j.vosburgh@gmail.com>
      Cc: Veaceslav Falico <vfalico@gmail.com>
      Cc: Andy Gospodarek <andy@greyhouse.net>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f51048c3
  13. 05 7月, 2017 1 次提交
  14. 03 7月, 2017 6 次提交
    • E
      net: avoid one splat in fib_nl_delrule() · 5361e209
      Eric Dumazet 提交于
      We need to use refcount_set() on a newly created rule to avoid
      following error :
      
      [   64.601749] ------------[ cut here ]------------
      [   64.601757] WARNING: CPU: 0 PID: 6476 at lib/refcount.c:184 refcount_sub_and_test+0x75/0xa0
      [   64.601758] Modules linked in: w1_therm wire cdc_acm ehci_pci ehci_hcd mlx4_en ib_uverbs mlx4_ib ib_core mlx4_core
      [   64.601769] CPU: 0 PID: 6476 Comm: ip Tainted: G        W       4.12.0-smp-DEV #274
      [   64.601771] task: ffff8837bf482040 task.stack: ffff8837bdc08000
      [   64.601773] RIP: 0010:refcount_sub_and_test+0x75/0xa0
      [   64.601774] RSP: 0018:ffff8837bdc0f5c0 EFLAGS: 00010286
      [   64.601776] RAX: 0000000000000026 RBX: 0000000000000001 RCX: 0000000000000000
      [   64.601777] RDX: 0000000000000026 RSI: 0000000000000096 RDI: ffffed06f7b81eae
      [   64.601778] RBP: ffff8837bdc0f5d0 R08: 0000000000000004 R09: fffffbfff4a54c25
      [   64.601779] R10: 00000000cbc500e5 R11: ffffffffa52a6128 R12: ffff881febcf6f24
      [   64.601779] R13: ffff881fbf4eaf00 R14: ffff881febcf6f80 R15: ffff8837d7a4ed00
      [   64.601781] FS:  00007ff5a2f6b700(0000) GS:ffff881fff800000(0000) knlGS:0000000000000000
      [   64.601782] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   64.601783] CR2: 00007ffcdc70d000 CR3: 0000001f9c91e000 CR4: 00000000001406f0
      [   64.601783] Call Trace:
      [   64.601786]  refcount_dec_and_test+0x11/0x20
      [   64.601790]  fib_nl_delrule+0xc39/0x1630
      [   64.601793]  ? is_bpf_text_address+0xe/0x20
      [   64.601795]  ? fib_nl_newrule+0x25e0/0x25e0
      [   64.601798]  ? depot_save_stack+0x133/0x470
      [   64.601801]  ? ns_capable+0x13/0x20
      [   64.601803]  ? __netlink_ns_capable+0xcc/0x100
      [   64.601806]  rtnetlink_rcv_msg+0x23a/0x6a0
      [   64.601808]  ? rtnl_newlink+0x1630/0x1630
      [   64.601811]  ? memset+0x31/0x40
      [   64.601813]  netlink_rcv_skb+0x2d7/0x440
      [   64.601815]  ? rtnl_newlink+0x1630/0x1630
      [   64.601816]  ? netlink_ack+0xaf0/0xaf0
      [   64.601818]  ? kasan_unpoison_shadow+0x35/0x50
      [   64.601820]  ? __kmalloc_node_track_caller+0x4c/0x70
      [   64.601821]  rtnetlink_rcv+0x28/0x30
      [   64.601823]  netlink_unicast+0x422/0x610
      [   64.601824]  ? netlink_attachskb+0x650/0x650
      [   64.601826]  netlink_sendmsg+0x7b7/0xb60
      [   64.601828]  ? netlink_unicast+0x610/0x610
      [   64.601830]  ? netlink_unicast+0x610/0x610
      [   64.601832]  sock_sendmsg+0xba/0xf0
      [   64.601834]  ___sys_sendmsg+0x6a9/0x8c0
      [   64.601835]  ? copy_msghdr_from_user+0x520/0x520
      [   64.601837]  ? __alloc_pages_nodemask+0x160/0x520
      [   64.601839]  ? memcg_write_event_control+0xd60/0xd60
      [   64.601841]  ? __alloc_pages_slowpath+0x1d50/0x1d50
      [   64.601843]  ? kasan_slab_free+0x71/0xc0
      [   64.601845]  ? mem_cgroup_commit_charge+0xb2/0x11d0
      [   64.601847]  ? lru_cache_add_active_or_unevictable+0x7d/0x1a0
      [   64.601849]  ? __handle_mm_fault+0x1af8/0x2810
      [   64.601851]  ? may_open_dev+0xc0/0xc0
      [   64.601852]  ? __pmd_alloc+0x2c0/0x2c0
      [   64.601853]  ? __fdget+0x13/0x20
      [   64.601855]  __sys_sendmsg+0xc6/0x150
      [   64.601856]  ? __sys_sendmsg+0xc6/0x150
      [   64.601857]  ? SyS_shutdown+0x170/0x170
      [   64.601859]  ? handle_mm_fault+0x28a/0x650
      [   64.601861]  SyS_sendmsg+0x12/0x20
      [   64.601863]  entry_SYSCALL_64_fastpath+0x13/0x94
      
      Fixes: 717d1e99 ("net: convert fib_rule.refcnt from atomic_t to refcount_t")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5361e209
    • A
      net: core: Fix slab-out-of-bounds in netdev_stats_to_stats64 · 9af9959e
      Alban Browaeys 提交于
      commit 9256645a ("net/core: relax BUILD_BUG_ON in
      netdev_stats_to_stats64") made an attempt to read beyond
      the size of the source a possibility.
      
      Fix to only copy src size to dest. As dest might be bigger than src.
      
       ==================================================================
       BUG: KASAN: slab-out-of-bounds in netdev_stats_to_stats64+0xe/0x30 at addr ffff8801be248b20
       Read of size 192 by task VBoxNetAdpCtl/6734
       CPU: 1 PID: 6734 Comm: VBoxNetAdpCtl Tainted: G           O    4.11.4prahal+intel+ #118
       Hardware name: LENOVO 20CDCTO1WW/20CDCTO1WW, BIOS GQET52WW (1.32 ) 05/04/2017
       Call Trace:
        dump_stack+0x63/0x86
        kasan_object_err+0x1c/0x70
        kasan_report+0x270/0x520
        ? netdev_stats_to_stats64+0xe/0x30
        ? sched_clock_cpu+0x1b/0x190
        ? __module_address+0x3e/0x3b0
        ? unwind_next_frame+0x1ea/0xb00
        check_memory_region+0x13c/0x1a0
        memcpy+0x23/0x50
        netdev_stats_to_stats64+0xe/0x30
        dev_get_stats+0x1b9/0x230
        rtnl_fill_stats+0x44/0xc00
        ? nla_put+0xc6/0x130
        rtnl_fill_ifinfo+0xe9e/0x3700
        ? rtnl_fill_vfinfo+0xde0/0xde0
        ? sched_clock+0x9/0x10
        ? sched_clock+0x9/0x10
        ? sched_clock_local+0x120/0x130
        ? __module_address+0x3e/0x3b0
        ? unwind_next_frame+0x1ea/0xb00
        ? sched_clock+0x9/0x10
        ? sched_clock+0x9/0x10
        ? sched_clock_cpu+0x1b/0x190
        ? VBoxNetAdpLinuxIOCtlUnlocked+0x14b/0x280 [vboxnetadp]
        ? depot_save_stack+0x1d8/0x4a0
        ? depot_save_stack+0x34f/0x4a0
        ? depot_save_stack+0x34f/0x4a0
        ? save_stack+0xb1/0xd0
        ? save_stack_trace+0x16/0x20
        ? save_stack+0x46/0xd0
        ? kasan_slab_alloc+0x12/0x20
        ? __kmalloc_node_track_caller+0x10d/0x350
        ? __kmalloc_reserve.isra.36+0x2c/0xc0
        ? __alloc_skb+0xd0/0x560
        ? rtmsg_ifinfo_build_skb+0x61/0x120
        ? rtmsg_ifinfo.part.25+0x16/0xb0
        ? rtmsg_ifinfo+0x47/0x70
        ? register_netdev+0x15/0x30
        ? vboxNetAdpOsCreate+0xc0/0x1c0 [vboxnetadp]
        ? vboxNetAdpCreate+0x210/0x400 [vboxnetadp]
        ? VBoxNetAdpLinuxIOCtlUnlocked+0x14b/0x280 [vboxnetadp]
        ? do_vfs_ioctl+0x17f/0xff0
        ? SyS_ioctl+0x74/0x80
        ? do_syscall_64+0x182/0x390
        ? __alloc_skb+0xd0/0x560
        ? __alloc_skb+0xd0/0x560
        ? save_stack_trace+0x16/0x20
        ? init_object+0x64/0xa0
        ? ___slab_alloc+0x1ae/0x5c0
        ? ___slab_alloc+0x1ae/0x5c0
        ? __alloc_skb+0xd0/0x560
        ? sched_clock+0x9/0x10
        ? kasan_unpoison_shadow+0x35/0x50
        ? kasan_kmalloc+0xad/0xe0
        ? __kmalloc_node_track_caller+0x246/0x350
        ? __alloc_skb+0xd0/0x560
        ? kasan_unpoison_shadow+0x35/0x50
        ? memset+0x31/0x40
        ? __alloc_skb+0x31f/0x560
        ? napi_consume_skb+0x320/0x320
        ? br_get_link_af_size_filtered+0xb7/0x120 [bridge]
        ? if_nlmsg_size+0x440/0x630
        rtmsg_ifinfo_build_skb+0x83/0x120
        rtmsg_ifinfo.part.25+0x16/0xb0
        rtmsg_ifinfo+0x47/0x70
        register_netdevice+0xa2b/0xe50
        ? __kmalloc+0x171/0x2d0
        ? netdev_change_features+0x80/0x80
        register_netdev+0x15/0x30
        vboxNetAdpOsCreate+0xc0/0x1c0 [vboxnetadp]
        vboxNetAdpCreate+0x210/0x400 [vboxnetadp]
        ? vboxNetAdpComposeMACAddress+0x1d0/0x1d0 [vboxnetadp]
        ? kasan_check_write+0x14/0x20
        VBoxNetAdpLinuxIOCtlUnlocked+0x14b/0x280 [vboxnetadp]
        ? VBoxNetAdpLinuxOpen+0x20/0x20 [vboxnetadp]
        ? lock_acquire+0x11c/0x270
        ? __audit_syscall_entry+0x2fb/0x660
        do_vfs_ioctl+0x17f/0xff0
        ? __audit_syscall_entry+0x2fb/0x660
        ? ioctl_preallocate+0x1d0/0x1d0
        ? __audit_syscall_entry+0x2fb/0x660
        ? kmem_cache_free+0xb2/0x250
        ? syscall_trace_enter+0x537/0xd00
        ? exit_to_usermode_loop+0x100/0x100
        SyS_ioctl+0x74/0x80
        ? do_sys_open+0x350/0x350
        ? do_vfs_ioctl+0xff0/0xff0
        do_syscall_64+0x182/0x390
        entry_SYSCALL64_slow_path+0x25/0x25
       RIP: 0033:0x7f7e39a1ae07
       RSP: 002b:00007ffc6f04c6d8 EFLAGS: 00000206 ORIG_RAX: 0000000000000010
       RAX: ffffffffffffffda RBX: 00007ffc6f04c730 RCX: 00007f7e39a1ae07
       RDX: 00007ffc6f04c730 RSI: 00000000c0207601 RDI: 0000000000000007
       RBP: 00007ffc6f04c700 R08: 00007ffc6f04c780 R09: 0000000000000008
       R10: 0000000000000541 R11: 0000000000000206 R12: 0000000000000007
       R13: 00000000c0207601 R14: 00007ffc6f04c730 R15: 0000000000000012
       Object at ffff8801be248008, in cache kmalloc-4096 size: 4096
       Allocated:
       PID = 6734
        save_stack_trace+0x16/0x20
        save_stack+0x46/0xd0
        kasan_kmalloc+0xad/0xe0
        __kmalloc+0x171/0x2d0
        alloc_netdev_mqs+0x8a7/0xbe0
        vboxNetAdpOsCreate+0x65/0x1c0 [vboxnetadp]
        vboxNetAdpCreate+0x210/0x400 [vboxnetadp]
        VBoxNetAdpLinuxIOCtlUnlocked+0x14b/0x280 [vboxnetadp]
        do_vfs_ioctl+0x17f/0xff0
        SyS_ioctl+0x74/0x80
        do_syscall_64+0x182/0x390
        return_from_SYSCALL_64+0x0/0x6a
       Freed:
       PID = 5600
        save_stack_trace+0x16/0x20
        save_stack+0x46/0xd0
        kasan_slab_free+0x73/0xc0
        kfree+0xe4/0x220
        kvfree+0x25/0x30
        single_release+0x74/0xb0
        __fput+0x265/0x6b0
        ____fput+0x9/0x10
        task_work_run+0xd5/0x150
        exit_to_usermode_loop+0xe2/0x100
        do_syscall_64+0x26c/0x390
        return_from_SYSCALL_64+0x0/0x6a
       Memory state around the buggy address:
        ffff8801be248a80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
        ffff8801be248b00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
       >ffff8801be248b80: 00 00 00 00 00 00 00 00 00 00 00 07 fc fc fc fc
                                                           ^
        ffff8801be248c00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
        ffff8801be248c80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
       ==================================================================
      Signed-off-by: NAlban Browaeys <alban.browaeys@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9af9959e
    • D
      bpf: simplify narrower ctx access · f96da094
      Daniel Borkmann 提交于
      This work tries to make the semantics and code around the
      narrower ctx access a bit easier to follow. Right now
      everything is done inside the .is_valid_access(). Offset
      matching is done differently for read/write types, meaning
      writes don't support narrower access and thus matching only
      on offsetof(struct foo, bar) is enough whereas for read
      case that supports narrower access we must check for
      offsetof(struct foo, bar) + offsetof(struct foo, bar) +
      sizeof(<bar>) - 1 for each of the cases. For read cases of
      individual members that don't support narrower access (like
      packet pointers or skb->cb[] case which has its own narrow
      access logic), we check as usual only offsetof(struct foo,
      bar) like in write case. Then, for the case where narrower
      access is allowed, we also need to set the aux info for the
      access. Meaning, ctx_field_size and converted_op_size have
      to be set. First is the original field size e.g. sizeof(<bar>)
      as in above example from the user facing ctx, and latter
      one is the target size after actual rewrite happened, thus
      for the kernel facing ctx. Also here we need the range match
      and we need to keep track changing convert_ctx_access() and
      converted_op_size from is_valid_access() as both are not at
      the same location.
      
      We can simplify the code a bit: check_ctx_access() becomes
      simpler in that we only store ctx_field_size as a meta data
      and later in convert_ctx_accesses() we fetch the target_size
      right from the location where we do convert. Should the verifier
      be misconfigured we do reject for BPF_WRITE cases or target_size
      that are not provided. For the subsystems, we always work on
      ranges in is_valid_access() and add small helpers for ranges
      and narrow access, convert_ctx_accesses() sets target_size
      for the relevant instruction.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
      Cc: Yonghong Song <yhs@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f96da094
    • D
      bpf: add bpf_skb_adjust_room helper · 2be7e212
      Daniel Borkmann 提交于
      This work adds a helper that can be used to adjust net room of an
      skb. The helper is generic and can be further extended in future.
      Main use case is for having a programmatic way to add/remove room to
      v4/v6 header options along with cls_bpf on egress and ingress hook
      of the data path. It reuses most of the infrastructure that we added
      for the bpf_skb_change_type() helper which can be used in nat64
      translations. Similarly, the helper only takes care of adjusting the
      room so that related data is populated and csum adapted out of the
      BPF program using it.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2be7e212
    • D
      bpf, net: add skb_mac_header_len helper · 0daf4349
      Daniel Borkmann 提交于
      Add a small skb_mac_header_len() helper similarly as the
      skb_network_header_len() we have and replace open coded
      places in BPF's bpf_skb_change_proto() helper. Will also
      be used in upcoming work.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0daf4349
    • L
      bpf: fix to bpf_setsockops · a5192c52
      Lawrence Brakmo 提交于
      Fixed build error due to misplaced "#ifdef CONFIG_INET" (moved 1
      statement up).
      Signed-off-by: NLawrence Brakmo <brakmo@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a5192c52