1. 29 9月, 2017 6 次提交
  2. 28 9月, 2017 1 次提交
    • K
      timer: Prepare to change timer callback argument type · 686fef92
      Kees Cook 提交于
      Modern kernel callback systems pass the structure associated with a
      given callback to the callback function. The timer callback remains one
      of the legacy cases where an arbitrary unsigned long argument continues
      to be passed as the callback argument. This has several problems:
      
      - This bloats the timer_list structure with a normally redundant
        .data field.
      
      - No type checking is being performed, forcing callbacks to do
        explicit type casts of the unsigned long argument into the object
        that was passed, rather than using container_of(), as done in most
        of the other callback infrastructure.
      
      - Neighboring buffer overflows can overwrite both the .function and
        the .data field, providing attackers with a way to elevate from a buffer
        overflow into a simplistic ROP-like mechanism that allows calling
        arbitrary functions with a controlled first argument.
      
      - For future Control Flow Integrity work, this creates a unique function
        prototype for timer callbacks, instead of allowing them to continue to
        be clustered with other void functions that take a single unsigned long
        argument.
      
      This adds a new timer initialization API, which will ultimately replace
      the existing setup_timer(), setup_{deferrable,pinned,etc}_timer() family,
      named timer_setup() (to mirror hrtimer_setup(), making instances of its
      use much easier to grep for).
      
      In order to support the migration of existing timers into the new
      callback arguments, timer_setup() casts its arguments to the existing
      legacy types, and explicitly passes the timer pointer as the legacy
      data argument. Once all setup_*timer() callers have been replaced with
      timer_setup(), the casts can be removed, and the data argument can be
      dropped with the timer expiration code changed to just pass the timer
      to the callback directly.
      
      Since the regular pattern of using container_of() during local variable
      declaration repeats the need for the variable type declaration
      to be included, this adds a helper modeled after other from_*()
      helpers that wrap container_of(), named from_timer(). This helper uses
      typeof(*variable), removing the type redundancy and minimizing the need
      for line wraps in forthcoming conversions from "unsigned data long" to
      "struct timer_list *" in the timer callbacks:
      
      -void callback(unsigned long data)
      +void callback(struct timer_list *t)
      {
      -   struct some_data_structure *local = (struct some_data_structure *)data;
      +   struct some_data_structure *local = from_timer(local, t, timer);
      
      Finally, in order to support the handful of timer users that perform
      open-coded assignments of the .function (and .data) fields, provide
      cast macros (TIMER_FUNC_TYPE and TIMER_DATA_TYPE) that can be used
      temporarily. Once conversion has been completed, these can be globally
      trivially removed.
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Link: https://lkml.kernel.org/r/20170928133817.GA113410@beast
      686fef92
  3. 27 9月, 2017 1 次提交
  4. 26 9月, 2017 4 次提交
  5. 25 9月, 2017 6 次提交
  6. 22 9月, 2017 3 次提交
    • E
      net: prevent dst uses after free · 222d7dbd
      Eric Dumazet 提交于
      In linux-4.13, Wei worked hard to convert dst to a traditional
      refcounted model, removing GC.
      
      We now want to make sure a dst refcount can not transition from 0 back
      to 1.
      
      The problem here is that input path attached a not refcounted dst to an
      skb. Then later, because packet is forwarded and hits skb_dst_force()
      before exiting RCU section, we might try to take a refcount on one dst
      that is about to be freed, if another cpu saw 1 -> 0 transition in
      dst_release() and queued the dst for freeing after one RCU grace period.
      
      Lets unify skb_dst_force() and skb_dst_force_safe(), since we should
      always perform the complete check against dst refcount, and not assume
      it is not zero.
      
      Bugzilla : https://bugzilla.kernel.org/show_bug.cgi?id=197005
      
      [  989.919496]  skb_dst_force+0x32/0x34
      [  989.919498]  __dev_queue_xmit+0x1ad/0x482
      [  989.919501]  ? eth_header+0x28/0xc6
      [  989.919502]  dev_queue_xmit+0xb/0xd
      [  989.919504]  neigh_connected_output+0x9b/0xb4
      [  989.919507]  ip_finish_output2+0x234/0x294
      [  989.919509]  ? ipt_do_table+0x369/0x388
      [  989.919510]  ip_finish_output+0x12c/0x13f
      [  989.919512]  ip_output+0x53/0x87
      [  989.919513]  ip_forward_finish+0x53/0x5a
      [  989.919515]  ip_forward+0x2cb/0x3e6
      [  989.919516]  ? pskb_trim_rcsum.part.9+0x4b/0x4b
      [  989.919518]  ip_rcv_finish+0x2e2/0x321
      [  989.919519]  ip_rcv+0x26f/0x2eb
      [  989.919522]  ? vlan_do_receive+0x4f/0x289
      [  989.919523]  __netif_receive_skb_core+0x467/0x50b
      [  989.919526]  ? tcp_gro_receive+0x239/0x239
      [  989.919529]  ? inet_gro_receive+0x226/0x238
      [  989.919530]  __netif_receive_skb+0x4d/0x5f
      [  989.919532]  netif_receive_skb_internal+0x5c/0xaf
      [  989.919533]  napi_gro_receive+0x45/0x81
      [  989.919536]  ixgbe_poll+0xc8a/0xf09
      [  989.919539]  ? kmem_cache_free_bulk+0x1b6/0x1f7
      [  989.919540]  net_rx_action+0xf4/0x266
      [  989.919543]  __do_softirq+0xa8/0x19d
      [  989.919545]  irq_exit+0x5d/0x6b
      [  989.919546]  do_IRQ+0x9c/0xb5
      [  989.919548]  common_interrupt+0x93/0x93
      [  989.919548]  </IRQ>
      
      Similarly dst_clone() can use dst_hold() helper to have additional
      debugging, as a follow up to commit 44ebe791 ("net: add debug
      atomic_inc_not_zero() in dst_hold()")
      
      In net-next we will convert dst atomic_t to refcount_t for peace of
      mind.
      
      Fixes: a4c2fd7f ("net: remove DST_NOCACHE flag")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Wei Wang <weiwan@google.com>
      Reported-by: NPaweł Staszewski <pstaszewski@itcare.pl>
      Bisected-by: NPaweł Staszewski <pstaszewski@itcare.pl>
      Acked-by: NWei Wang <weiwan@google.com>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      222d7dbd
    • D
      Input: uinput - avoid FF flush when destroying device · e8b95728
      Dmitry Torokhov 提交于
      Normally, when input device supporting force feedback effects is being
      destroyed, we try to "flush" currently playing effects, so that the
      physical device does not continue vibrating (or executing other effects).
      Unfortunately this does not work well for uinput as flushing of the effects
      deadlocks with the destroy action:
      
      - if device is being destroyed because the file descriptor is being closed,
        then there is noone to even service FF requests;
      
      - if device is being destroyed because userspace sent UI_DEV_DESTROY,
        while theoretically it could be possible to service FF requests,
        userspace is unlikely to do so (they'd need to make sure FF handling
        happens on a separate thread) even if kernel solves the issue with FF
        ioctls deadlocking with UI_DEV_DESTROY ioctl on udev->mutex.
      
      To avoid lockups like the one below, let's install a custom input device
      flush handler, and avoid trying to flush force feedback effects when we
      destroying the device, and instead rely on uinput to shut off the device
      properly.
      
      NMI watchdog: Watchdog detected hard LOCKUP on cpu 3
      ...
       <<EOE>>  [<ffffffff817a0307>] _raw_spin_lock_irqsave+0x37/0x40
       [<ffffffff810e633d>] complete+0x1d/0x50
       [<ffffffffa00ba08c>] uinput_request_done+0x3c/0x40 [uinput]
       [<ffffffffa00ba587>] uinput_request_submit.part.7+0x47/0xb0 [uinput]
       [<ffffffffa00bb62b>] uinput_dev_erase_effect+0x5b/0x76 [uinput]
       [<ffffffff815d91ad>] erase_effect+0xad/0xf0
       [<ffffffff815d929d>] flush_effects+0x4d/0x90
       [<ffffffff815d4cc0>] input_flush_device+0x40/0x60
       [<ffffffff815daf1c>] evdev_cleanup+0xac/0xc0
       [<ffffffff815daf5b>] evdev_disconnect+0x2b/0x60
       [<ffffffff815d74ac>] __input_unregister_device+0xac/0x150
       [<ffffffff815d75f7>] input_unregister_device+0x47/0x70
       [<ffffffffa00bac45>] uinput_destroy_device+0xb5/0xc0 [uinput]
       [<ffffffffa00bb2de>] uinput_ioctl_handler.isra.9+0x65e/0x740 [uinput]
       [<ffffffff811231ab>] ? do_futex+0x12b/0xad0
       [<ffffffffa00bb3f8>] uinput_ioctl+0x18/0x20 [uinput]
       [<ffffffff81241248>] do_vfs_ioctl+0x298/0x480
       [<ffffffff81337553>] ? security_file_ioctl+0x43/0x60
       [<ffffffff812414a9>] SyS_ioctl+0x79/0x90
       [<ffffffff817a04ee>] entry_SYSCALL_64_fastpath+0x12/0x71
      Reported-by: NRodrigo Rivas Costa <rodrigorivascosta@gmail.com>
      Reported-by: NClément VUCHENER <clement.vuchener@gmail.com>
      Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=193741Signed-off-by: NDmitry Torokhov <dmitry.torokhov@gmail.com>
      e8b95728
    • F
      net: ethtool: Add back transceiver type · 19cab887
      Florian Fainelli 提交于
      Commit 3f1ac7a7 ("net: ethtool: add new ETHTOOL_xLINKSETTINGS API")
      deprecated the ethtool_cmd::transceiver field, which was fine in
      premise, except that the PHY library was actually using it to report the
      type of transceiver: internal or external.
      
      Use the first word of the reserved field to put this __u8 transceiver
      field back in. It is made read-only, and we don't expect the
      ETHTOOL_xLINKSETTINGS API to be doing anything with this anyway, so this
      is mostly for the legacy path where we do:
      
      ethtool_get_settings()
      -> dev->ethtool_ops->get_link_ksettings()
         -> convert_link_ksettings_to_legacy_settings()
      
      to have no information loss compared to the legacy get_settings API.
      
      Fixes: 3f1ac7a7 ("net: ethtool: add new ETHTOOL_xLINKSETTINGS API")
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      19cab887
  7. 21 9月, 2017 2 次提交
  8. 20 9月, 2017 2 次提交
    • J
      ACPI / bus: Make ACPI_HANDLE() work for non-GPL code again · 9e987b70
      John Hubbard 提交于
      Due to commit db3e50f3 (device property: Get rid of struct
      fwnode_handle type field), ACPI_HANDLE() inadvertently became
      a GPL-only call. The call path that led to that was:
      
      ACPI_HANDLE()
          ACPI_COMPANION()
              to_acpi_device_node()
                  is_acpi_device_node()
                      acpi_device_fwnode_ops
                          DECLARE_ACPI_FWNODE_OPS(acpi_device_fwnode_ops);
      
      ...and the new DECLARE_ACPI_FWNODE_OPS() includes
      EXPORT_SYMBOL_GPL, whereas previously it was a static struct.
      
      In order to avoid changing any of that, let's instead provide ever
      so slightly better encapsulation of those struct fwnode_operations
      instances. Those do not really need to be directly used in
      inline function calls in header files. Simply moving two small
      functions (is_acpi_device_node and is_acpi_data_node) out of
      acpi_bus.h, and into a .c file, does that.
      
      That leaves the internals of struct fwnode_operations as GPL-only
      (which I think was the intent all along), but un-breaks any driver
      code out there that relies on the ACPI subsystem's being (historically)
      an EXPORT_SYMBOL-usable system. By that, I mean, ACPI_HANDLE() and
      other basic ACPI calls were non-GPL-protected.
      
      Also, while I'm there, remove a tiny bit of redundancy that was missed
      in the earlier commit, by having is_acpi_node() use the other two
      routines, instead of checking fwnode directly.
      
      Fixes: db3e50f3 (device property: Get rid of struct fwnode_handle type field)
      Signed-off-by: NJohn Hubbard <jhubbard@nvidia.com>
      Acked-by: NSakari Ailus <sakari.ailus@linux.intel.com>
      Acked-by: NMika Westerberg <mika.westerberg@linux.intel.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      9e987b70
    • A
      of: provide inline helper for of_find_device_by_node · aa767cfb
      Arnd Bergmann 提交于
      The ipmmu-vmsa driver fails in compile-testing on non-OF platforms:
      
      drivers/iommu/ipmmu-vmsa.o: In function `ipmmu_of_xlate':
      ipmmu-vmsa.c:(.text+0x740): undefined reference to `of_find_device_by_node'
      
      It would be reasonable to assume that this interface works but
      returns failure on non-OF builds, like it does on machines that
      have been booted in another way, so this adds another inline
      function helper.
      
      Fixes: 7b2d5961 ("iommu/ipmmu-vmsa: Replace local utlb code with fwspec ids")
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NRob Herring <robh@kernel.org>
      aa767cfb
  9. 19 9月, 2017 2 次提交
  10. 18 9月, 2017 2 次提交
  11. 16 9月, 2017 1 次提交
    • X
      sctp: fix an use-after-free issue in sctp_sock_dump · d25adbeb
      Xin Long 提交于
      Commit 86fdb344 ("sctp: ensure ep is not destroyed before doing the
      dump") tried to fix an use-after-free issue by checking !sctp_sk(sk)->ep
      with holding sock and sock lock.
      
      But Paolo noticed that endpoint could be destroyed in sctp_rcv without
      sock lock protection. It means the use-after-free issue still could be
      triggered when sctp_rcv put and destroy ep after sctp_sock_dump checks
      !ep, although it's pretty hard to reproduce.
      
      I could reproduce it by mdelay in sctp_rcv while msleep in sctp_close
      and sctp_sock_dump long time.
      
      This patch is to add another param cb_done to sctp_for_each_transport
      and dump ep->assocs with holding tsp after jumping out of transport's
      traversal in it to avoid this issue.
      
      It can also improve sctp diag dump to make it run faster, as no need
      to save sk into cb->args[5] and keep calling sctp_for_each_transport
      any more.
      
      This patch is also to use int * instead of int for the pos argument
      in sctp_for_each_transport, which could make postion increment only
      in sctp_for_each_transport and no need to keep changing cb->args[2]
      in sctp_sock_filter and sctp_sock_dump any more.
      
      Fixes: 86fdb344 ("sctp: ensure ep is not destroyed before doing the dump")
      Reported-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d25adbeb
  12. 15 9月, 2017 5 次提交
  13. 14 9月, 2017 2 次提交
    • M
      mm: treewide: remove GFP_TEMPORARY allocation flag · 0ee931c4
      Michal Hocko 提交于
      GFP_TEMPORARY was introduced by commit e12ba74d ("Group short-lived
      and reclaimable kernel allocations") along with __GFP_RECLAIMABLE.  It's
      primary motivation was to allow users to tell that an allocation is
      short lived and so the allocator can try to place such allocations close
      together and prevent long term fragmentation.  As much as this sounds
      like a reasonable semantic it becomes much less clear when to use the
      highlevel GFP_TEMPORARY allocation flag.  How long is temporary? Can the
      context holding that memory sleep? Can it take locks? It seems there is
      no good answer for those questions.
      
      The current implementation of GFP_TEMPORARY is basically GFP_KERNEL |
      __GFP_RECLAIMABLE which in itself is tricky because basically none of
      the existing caller provide a way to reclaim the allocated memory.  So
      this is rather misleading and hard to evaluate for any benefits.
      
      I have checked some random users and none of them has added the flag
      with a specific justification.  I suspect most of them just copied from
      other existing users and others just thought it might be a good idea to
      use without any measuring.  This suggests that GFP_TEMPORARY just
      motivates for cargo cult usage without any reasoning.
      
      I believe that our gfp flags are quite complex already and especially
      those with highlevel semantic should be clearly defined to prevent from
      confusion and abuse.  Therefore I propose dropping GFP_TEMPORARY and
      replace all existing users to simply use GFP_KERNEL.  Please note that
      SLAB users with shrinkers will still get __GFP_RECLAIMABLE heuristic and
      so they will be placed properly for memory fragmentation prevention.
      
      I can see reasons we might want some gfp flag to reflect shorterm
      allocations but I propose starting from a clear semantic definition and
      only then add users with proper justification.
      
      This was been brought up before LSF this year by Matthew [1] and it
      turned out that GFP_TEMPORARY really doesn't have a clear semantic.  It
      seems to be a heuristic without any measured advantage for most (if not
      all) its current users.  The follow up discussion has revealed that
      opinions on what might be temporary allocation differ a lot between
      developers.  So rather than trying to tweak existing users into a
      semantic which they haven't expected I propose to simply remove the flag
      and start from scratch if we really need a semantic for short term
      allocations.
      
      [1] http://lkml.kernel.org/r/20170118054945.GD18349@bombadil.infradead.org
      
      [akpm@linux-foundation.org: fix typo]
      [akpm@linux-foundation.org: coding-style fixes]
      [sfr@canb.auug.org.au: drm/i915: fix up]
        Link: http://lkml.kernel.org/r/20170816144703.378d4f4d@canb.auug.org.au
      Link: http://lkml.kernel.org/r/20170728091904.14627-1-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Neil Brown <neilb@suse.de>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0ee931c4
    • D
      sctp: potential read out of bounds in sctp_ulpevent_type_enabled() · fa5f7b51
      Dan Carpenter 提交于
      This code causes a static checker warning because Smatch doesn't trust
      anything that comes from skb->data.  I've reviewed this code and I do
      think skb->data can be controlled by the user here.
      
      The sctp_event_subscribe struct has 13 __u8 fields and we want to see
      if ours is non-zero.  sn_type can be any value in the 0-USHRT_MAX range.
      We're subtracting SCTP_SN_TYPE_BASE which is 1 << 15 so we could read
      either before the start of the struct or after the end.
      
      This is a very old bug and it's surprising that it would go undetected
      for so long but my theory is that it just doesn't have a big impact so
      it would be hard to notice.
      Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fa5f7b51
  14. 13 9月, 2017 1 次提交
    • C
      net_sched: get rid of tcfa_rcu · d7fb60b9
      Cong Wang 提交于
      gen estimator has been rewritten in commit 1c0d32fd
      ("net_sched: gen_estimator: complete rewrite of rate estimators"),
      the caller is no longer needed to wait for a grace period.
      So this patch gets rid of it.
      
      This also completely closes a race condition between action free
      path and filter chain add/remove path for the following patch.
      Because otherwise the nested RCU callback can't be caught by
      rcu_barrier().
      
      Please see also the comments in code.
      
      Cc: Jiri Pirko <jiri@mellanox.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d7fb60b9
  15. 12 9月, 2017 2 次提交
    • J
      xdp: implement xdp_redirect_map for generic XDP · 96c5508e
      Jesper Dangaard Brouer 提交于
      Using bpf_redirect_map is allowed for generic XDP programs, but the
      appropriate map lookup was never performed in xdp_do_generic_redirect().
      
      Instead the map-index is directly used as the ifindex.  For the
      xdp_redirect_map sample in SKB-mode '-S', this resulted in trying
      sending on ifindex 0 which isn't valid, resulting in getting SKB
      packets dropped.  Thus, the reported performance numbers are wrong in
      commit 24251c26 ("samples/bpf: add option for native and skb mode
      for redirect apps") for the 'xdp_redirect_map -S' case.
      
      Before commit 109980b8 ("bpf: don't select potentially stale
      ri->map from buggy xdp progs") it could crash the kernel.  Like this
      commit also check that the map_owner owner is correct before
      dereferencing the map pointer.  But make sure that this API misusage
      can be caught by a tracepoint. Thus, allowing userspace via
      tracepoints to detect misbehaving bpf_progs.
      
      Fixes: 6103aa96 ("net: implement XDP_REDIRECT for xdp generic")
      Fixes: 24251c26 ("samples/bpf: add option for native and skb mode for redirect apps")
      Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      96c5508e
    • Y
      perf/bpf: fix a clang compilation issue · 609320c8
      Yonghong Song 提交于
      clang does not support variable length array for structure member.
      It has the following error during compilation:
      
      kernel/trace/trace_syscalls.c:568:17: error: fields must have a constant size:
      'variable length array in structure' extension will never be supported
                      unsigned long args[sys_data->nb_args];
                                    ^
      
      The fix is to use a fixed array length instead.
      Reported-by: NNick Desaulniers <ndesaulniers@google.com>
      Signed-off-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      609320c8