1. 15 12月, 2015 4 次提交
  2. 14 12月, 2015 1 次提交
  3. 13 12月, 2015 2 次提交
  4. 12 12月, 2015 1 次提交
  5. 11 12月, 2015 1 次提交
  6. 10 12月, 2015 1 次提交
    • S
      bitops.h: correctly handle rol32 with 0 byte shift · d7e35dfa
      Sasha Levin 提交于
      ROL on a 32 bit integer with a shift of 32 or more is undefined and the
      result is arch-dependent. Avoid this by handling the trivial case of
      roling by 0 correctly.
      
      The trivial solution of checking if shift is 0 breaks gcc's detection
      of this code as a ROL instruction, which is unacceptable.
      
      This bug was reported and fixed in GCC
      (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57157):
      
      	The standard rotate idiom,
      
      	  (x << n) | (x >> (32 - n))
      
      	is recognized by gcc (for concreteness, I discuss only the case that x
      	is an uint32_t here).
      
      	However, this is portable C only for n in the range 0 < n < 32. For n
      	== 0, we get x >> 32 which gives undefined behaviour according to the
      	C standard (6.5.7, Bitwise shift operators). To portably support n ==
      	0, one has to write the rotate as something like
      
      	  (x << n) | (x >> ((-n) & 31))
      
      	And this is apparently not recognized by gcc.
      
      Note that this is broken on older GCCs and will result in slower ROL.
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NSasha Levin <sasha.levin@oracle.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d7e35dfa
  7. 09 12月, 2015 4 次提交
  8. 08 12月, 2015 3 次提交
  9. 07 12月, 2015 2 次提交
  10. 04 12月, 2015 5 次提交
    • A
      Revert: "vfio: Include No-IOMMU mode" · ae5515d6
      Alex Williamson 提交于
      Revert commit 033291ec ("vfio: Include No-IOMMU mode") due to lack
      of a user.  This was originally intended to fill a need for the DPDK
      driver, but uptake has been slow so rather than support an unproven
      kernel interface revert it and revisit when userspace catches up.
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      ae5515d6
    • D
      drm/nouveau: Fix pre-nv50 pageflip events (v4) · bbc8764f
      Daniel Vetter 提交于
      Apparently pre-nv50 pageflip events happen before the actual vblank
      period. Therefore that functionality got semi-disabled in
      
      commit af4870e4
      Author: Mario Kleiner <mario.kleiner.de@gmail.com>
      Date:   Tue May 13 00:42:08 2014 +0200
      
          drm/nouveau/kms/nv04-nv40: fix pageflip events via special case.
      
      Unfortunately that hack got uprooted in
      
      commit cc1ef118
      Author: Thierry Reding <treding@nvidia.com>
      Date:   Wed Aug 12 17:00:31 2015 +0200
      
          drm/irq: Make pipe unsigned and name consistent
      
      Triggering a warning when trying to sample the vblank timestamp for a
      non-existing pipe. There's a few ways to fix this:
      
      - Open-code the old behaviour, which just enshrines this slight
        breakage of the userspace ABI.
      
      - Revert Mario's commit and again inflict broken timestamps, again not
        pretty.
      
      - Fix this for real by delaying the pageflip TS until the next vblank
        interrupt, thereby making it accurate.
      
      This patch implements the third option. Since having a page flip
      interrupt that happens when the pageflip gets armed and not when it
      completes in the next vblank seems to be fairly common (older i915 hw
      works very similarly) create a new helper to arm vblank events for
      such drivers.
      
      v2 (Mario Kleiner):
      - Fix function prototypes in drmP.h
      - Add missing vblank_put() for pageflip completion without
        pageflip event.
      - Initialize sequence number for queued pageflip event to avoid
        trouble in drm_handle_vblank_events().
      - Remove dead code and spelling fix.
      
      v3 (Mario Kleiner):
      - Add a signed-off-by and cc stable tag per Ilja's advice.
      
      v4 (Thierry Reding):
      - Fix kerneldoc typo, discovered by Michel Dänzer
      - Rearrange tags and changelog
      
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=106431
      Cc: Thierry Reding <treding@nvidia.com>
      Cc: Mario Kleiner <mario.kleiner.de@gmail.com>
      Acked-by: NBen Skeggs <bskeggs@redhat.com>
      Cc: Ilia Mirkin <imirkin@alum.mit.edu>
      Signed-off-by: NDaniel Vetter <daniel.vetter@intel.com>
      Reviewed-by: NMario Kleiner <mario.kleiner.de@gmail.com>
      Cc: stable@vger.kernel.org # v4.3
      Signed-off-by: NMario Kleiner <mario.kleiner.de@gmail.com>
      Signed-off-by: NThierry Reding <treding@nvidia.com>
      Signed-off-by: NDave Airlie <airlied@redhat.com>
      bbc8764f
    • T
      drm: Fix an unwanted master inheritance v2 · a0af2e53
      Thomas Hellstrom 提交于
      A client calling drmSetMaster() using a file descriptor that was opened
      when another client was master would inherit the latter client's master
      object and all its authenticated clients.
      
      This is unwanted behaviour, and when this happens, instead allocate a
      brand new master object for the client calling drmSetMaster().
      
      Fixes a BUG() throw in vmw_master_set().
      
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NThomas Hellstrom <thellstrom@vmware.com>
      Signed-off-by: NDave Airlie <airlied@redhat.com>
      a0af2e53
    • E
      net_sched: fix qdisc_tree_decrease_qlen() races · 4eaf3b84
      Eric Dumazet 提交于
      qdisc_tree_decrease_qlen() suffers from two problems on multiqueue
      devices.
      
      One problem is that it updates sch->q.qlen and sch->qstats.drops
      on the mq/mqprio root qdisc, while it should not : Daniele
      reported underflows errors :
      [  681.774821] PAX: sch->q.qlen: 0 n: 1
      [  681.774825] PAX: size overflow detected in function qdisc_tree_decrease_qlen net/sched/sch_api.c:769 cicus.693_49 min, count: 72, decl: qlen; num: 0; context: sk_buff_head;
      [  681.774954] CPU: 2 PID: 19 Comm: ksoftirqd/2 Tainted: G           O    4.2.6.201511282239-1-grsec #1
      [  681.774955] Hardware name: ASUSTeK COMPUTER INC. X302LJ/X302LJ, BIOS X302LJ.202 03/05/2015
      [  681.774956]  ffffffffa9a04863 0000000000000000 0000000000000000 ffffffffa990ff7c
      [  681.774959]  ffffc90000d3bc38 ffffffffa95d2810 0000000000000007 ffffffffa991002b
      [  681.774960]  ffffc90000d3bc68 ffffffffa91a44f4 0000000000000001 0000000000000001
      [  681.774962] Call Trace:
      [  681.774967]  [<ffffffffa95d2810>] dump_stack+0x4c/0x7f
      [  681.774970]  [<ffffffffa91a44f4>] report_size_overflow+0x34/0x50
      [  681.774972]  [<ffffffffa94d17e2>] qdisc_tree_decrease_qlen+0x152/0x160
      [  681.774976]  [<ffffffffc02694b1>] fq_codel_dequeue+0x7b1/0x820 [sch_fq_codel]
      [  681.774978]  [<ffffffffc02680a0>] ? qdisc_peek_dequeued+0xa0/0xa0 [sch_fq_codel]
      [  681.774980]  [<ffffffffa94cd92d>] __qdisc_run+0x4d/0x1d0
      [  681.774983]  [<ffffffffa949b2b2>] net_tx_action+0xc2/0x160
      [  681.774985]  [<ffffffffa90664c1>] __do_softirq+0xf1/0x200
      [  681.774987]  [<ffffffffa90665ee>] run_ksoftirqd+0x1e/0x30
      [  681.774989]  [<ffffffffa90896b0>] smpboot_thread_fn+0x150/0x260
      [  681.774991]  [<ffffffffa9089560>] ? sort_range+0x40/0x40
      [  681.774992]  [<ffffffffa9085fe4>] kthread+0xe4/0x100
      [  681.774994]  [<ffffffffa9085f00>] ? kthread_worker_fn+0x170/0x170
      [  681.774995]  [<ffffffffa95d8d1e>] ret_from_fork+0x3e/0x70
      
      mq/mqprio have their own ways to report qlen/drops by folding stats on
      all their queues, with appropriate locking.
      
      A second problem is that qdisc_tree_decrease_qlen() calls qdisc_lookup()
      without proper locking : concurrent qdisc updates could corrupt the list
      that qdisc_match_from_root() parses to find a qdisc given its handle.
      
      Fix first problem adding a TCQ_F_NOPARENT qdisc flag that
      qdisc_tree_decrease_qlen() can use to abort its tree traversal,
      as soon as it meets a mq/mqprio qdisc children.
      
      Second problem can be fixed by RCU protection.
      Qdisc are already freed after RCU grace period, so qdisc_list_add() and
      qdisc_list_del() simply have to use appropriate rcu list variants.
      
      A future patch will add a per struct netdev_queue list anchor, so that
      qdisc_tree_decrease_qlen() can have more efficient lookups.
      Reported-by: NDaniele Fucini <dfucini@gmail.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Cong Wang <cwang@twopensource.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4eaf3b84
    • E
      ipv6: kill sk_dst_lock · 6bd4f355
      Eric Dumazet 提交于
      While testing the np->opt RCU conversion, I found that UDP/IPv6 was
      using a mixture of xchg() and sk_dst_lock to protect concurrent changes
      to sk->sk_dst_cache, leading to possible corruptions and crashes.
      
      ip6_sk_dst_lookup_flow() uses sk_dst_check() anyway, so the simplest
      way to fix the mess is to remove sk_dst_lock completely, as we did for
      IPv4.
      
      __ip6_dst_store() and ip6_dst_store() share same implementation.
      
      sk_setup_caps() being called with socket lock being held or not,
      we have to use sk_dst_set() instead of __sk_dst_set()
      
      Note that I had to move the "np->dst_cookie = rt6_get_cookie(rt);"
      in ip6_dst_store() before the sk_setup_caps(sk, dst) call.
      
      This is because ip6_dst_store() can be called from process context,
      without any lock held.
      
      As soon as the dst is installed in sk->sk_dst_cache, dst can be freed
      from another cpu doing a concurrent ip6_dst_store()
      
      Doing the dst dereference before doing the install is needed to make
      sure no use after free would trigger.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6bd4f355
  11. 03 12月, 2015 4 次提交
    • T
      cgroup: fix handling of multi-destination migration from subtree_control enabling · 1f7dd3e5
      Tejun Heo 提交于
      Consider the following v2 hierarchy.
      
        P0 (+memory) --- P1 (-memory) --- A
                                       \- B
             
      P0 has memory enabled in its subtree_control while P1 doesn't.  If
      both A and B contain processes, they would belong to the memory css of
      P1.  Now if memory is enabled on P1's subtree_control, memory csses
      should be created on both A and B and A's processes should be moved to
      the former and B's processes the latter.  IOW, enabling controllers
      can cause atomic migrations into different csses.
      
      The core cgroup migration logic has been updated accordingly but the
      controller migration methods haven't and still assume that all tasks
      migrate to a single target css; furthermore, the methods were fed the
      css in which subtree_control was updated which is the parent of the
      target csses.  pids controller depends on the migration methods to
      move charges and this made the controller attribute charges to the
      wrong csses often triggering the following warning by driving a
      counter negative.
      
       WARNING: CPU: 1 PID: 1 at kernel/cgroup_pids.c:97 pids_cancel.constprop.6+0x31/0x40()
       Modules linked in:
       CPU: 1 PID: 1 Comm: systemd Not tainted 4.4.0-rc1+ #29
       ...
        ffffffff81f65382 ffff88007c043b90 ffffffff81551ffc 0000000000000000
        ffff88007c043bc8 ffffffff810de202 ffff88007a752000 ffff88007a29ab00
        ffff88007c043c80 ffff88007a1d8400 0000000000000001 ffff88007c043bd8
       Call Trace:
        [<ffffffff81551ffc>] dump_stack+0x4e/0x82
        [<ffffffff810de202>] warn_slowpath_common+0x82/0xc0
        [<ffffffff810de2fa>] warn_slowpath_null+0x1a/0x20
        [<ffffffff8118e031>] pids_cancel.constprop.6+0x31/0x40
        [<ffffffff8118e0fd>] pids_can_attach+0x6d/0xf0
        [<ffffffff81188a4c>] cgroup_taskset_migrate+0x6c/0x330
        [<ffffffff81188e05>] cgroup_migrate+0xf5/0x190
        [<ffffffff81189016>] cgroup_attach_task+0x176/0x200
        [<ffffffff8118949d>] __cgroup_procs_write+0x2ad/0x460
        [<ffffffff81189684>] cgroup_procs_write+0x14/0x20
        [<ffffffff811854e5>] cgroup_file_write+0x35/0x1c0
        [<ffffffff812e26f1>] kernfs_fop_write+0x141/0x190
        [<ffffffff81265f88>] __vfs_write+0x28/0xe0
        [<ffffffff812666fc>] vfs_write+0xac/0x1a0
        [<ffffffff81267019>] SyS_write+0x49/0xb0
        [<ffffffff81bcef32>] entry_SYSCALL_64_fastpath+0x12/0x76
      
      This patch fixes the bug by removing @css parameter from the three
      migration methods, ->can_attach, ->cancel_attach() and ->attach() and
      updating cgroup_taskset iteration helpers also return the destination
      css in addition to the task being migrated.  All controllers are
      updated accordingly.
      
      * Controllers which don't care whether there are one or multiple
        target csses can be converted trivially.  cpu, io, freezer, perf,
        netclassid and netprio fall in this category.
      
      * cpuset's current implementation assumes that there's single source
        and destination and thus doesn't support v2 hierarchy already.  The
        only change made by this patchset is how that single destination css
        is obtained.
      
      * memory migration path already doesn't do anything on v2.  How the
        single destination css is obtained is updated and the prep stage of
        mem_cgroup_can_attach() is reordered to accomodate the change.
      
      * pids is the only controller which was affected by this bug.  It now
        correctly handles multi-destination migrations and no longer causes
        counter underflow from incorrect accounting.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-and-tested-by: NDaniel Wagner <daniel.wagner@bmw-carit.de>
      Cc: Aleksa Sarai <cyphar@cyphar.com>
      1f7dd3e5
    • M
      sctp: convert sack_needed and sack_generation to bits · 38ee8fb6
      Marcelo Ricardo Leitner 提交于
      They don't need to be any bigger than that and with this we start a new
      bitfield for tracking association runtime stuff, like zero window
      situation.
      Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Acked-by: NVlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      38ee8fb6
    • E
      ipv6: add complete rcu protection around np->opt · 45f6fad8
      Eric Dumazet 提交于
      This patch addresses multiple problems :
      
      UDP/RAW sendmsg() need to get a stable struct ipv6_txoptions
      while socket is not locked : Other threads can change np->opt
      concurrently. Dmitry posted a syzkaller
      (http://github.com/google/syzkaller) program desmonstrating
      use-after-free.
      
      Starting with TCP/DCCP lockless listeners, tcp_v6_syn_recv_sock()
      and dccp_v6_request_recv_sock() also need to use RCU protection
      to dereference np->opt once (before calling ipv6_dup_options())
      
      This patch adds full RCU protection to np->opt
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      45f6fad8
    • S
      cpufreq: use last policy after online for drivers with ->setpolicy · 69030dd1
      Srinivas Pandruvada 提交于
      For cpufreq drivers which use setpolicy interface, after offline->online
      the policy is set to default. This can be reproduced by setting the
      default policy of intel_pstate or longrun to ondemand and then change to
      "performance". After offline and online, the setpolicy will be called with
      the policy=ondemand.
      
      For drivers using governors this condition is handled by storing
      last_governor, during offline and restoring during online. The same should
      be done for drivers using setpolicy interface. Storing last_policy during
      offline and restoring during online.
      Signed-off-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      69030dd1
  12. 02 12月, 2015 4 次提交
  13. 30 11月, 2015 3 次提交
  14. 29 11月, 2015 2 次提交
    • B
      kref: Remove kref_put_spinlock_irqsave() · 3a66d7dc
      Bart Van Assche 提交于
      The last user is gone. Hence remove this function.
      Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Joern Engel <joern@logfs.org>
      Signed-off-by: NNicholas Bellinger <nab@linux-iscsi.org>
      3a66d7dc
    • N
      target: Fix race for SCF_COMPARE_AND_WRITE_POST checking · 057085e5
      Nicholas Bellinger 提交于
      This patch addresses a race + use after free where the first
      stage of COMPARE_AND_WRITE in compare_and_write_callback()
      is rescheduled after the backend sends the secondary WRITE,
      resulting in second stage compare_and_write_post() callback
      completing in target_complete_ok_work() before the first
      can return.
      
      Because current code depends on checking se_cmd->se_cmd_flags
      after return from se_cmd->transport_complete_callback(),
      this results in first stage having SCF_COMPARE_AND_WRITE_POST
      set, which incorrectly falls through into second stage CAW
      processing code, eventually triggering a NULL pointer
      dereference due to use after free.
      
      To address this bug, pass in a new *post_ret parameter into
      se_cmd->transport_complete_callback(), and depend upon this
      value instead of ->se_cmd_flags to determine when to return
      or fall through into ->queue_status() code for CAW.
      
      Cc: Sagi Grimberg <sagig@mellanox.com>
      Cc: <stable@vger.kernel.org> # v3.12+
      Signed-off-by: NNicholas Bellinger <nab@linux-iscsi.org>
      057085e5
  15. 26 11月, 2015 3 次提交
    • M
      block/sd: Fix device-imposed transfer length limits · ca369d51
      Martin K. Petersen 提交于
      Commit 4f258a46 ("sd: Fix maximum I/O size for BLOCK_PC requests")
      had the unfortunate side-effect of removing an implicit clamp to
      BLK_DEF_MAX_SECTORS for REQ_TYPE_FS requests in the block layer
      code. This caused problems for some SMR drives.
      
      Debugging this issue revealed a few problems with the existing
      infrastructure since the block layer didn't know how to deal with
      device-imposed limits, only limits set by the I/O controller.
      
       - Introduce a new queue limit, max_dev_sectors, which is used by the
         ULD to signal the maximum sectors for a REQ_TYPE_FS request.
      
       - Ensure that max_dev_sectors is correctly stacked and taken into
         account when overriding max_sectors through sysfs.
      
       - Rework sd_read_block_limits() so it saves the max_xfer and opt_xfer
         values for later processing.
      
       - In sd_revalidate() set the queue's max_dev_sectors based on the
         MAXIMUM TRANSFER LENGTH value in the Block Limits VPD. If this value
         is not reported, fall back to a cap based on the CDB TRANSFER LENGTH
         field size.
      
       - In sd_revalidate(), use OPTIMAL TRANSFER LENGTH from the Block Limits
         VPD--if reported and sane--to signal the preferred device transfer
         size for FS requests. Otherwise use BLK_DEF_MAX_SECTORS.
      
       - blk_limits_max_hw_sectors() is no longer used and can be removed.
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=93581Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Tested-by: sweeneygj@gmx.com
      Tested-by: NArzeets <anatol.pomozov@gmail.com>
      Tested-by: NDavid Eisner <david.eisner@oriel.oxon.org>
      Tested-by: NMario Kicherer <dev@kicherer.org>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      ca369d51
    • G
      ARM/PCI: Move align_resource function pointer to pci_host_bridge structure · 7c7a0e94
      Gabriele Paoloni 提交于
      Commit b3a72384 ("ARM/PCI: Replace pci_sys_data->align_resource with
      global function pointer") introduced an ARM-specific align_resource()
      function pointer.  This is not portable to other arches and doesn't work
      for platforms with two different PCIe host bridge controllers.
      
      Move the function pointer to the pci_host_bridge structure so each host
      bridge driver can specify its own align_resource() function.
      Signed-off-by: NGabriele Paoloni <gabriele.paoloni@huawei.com>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Reviewed-by: NArnd Bergmann <arnd@arndb.de>
      7c7a0e94
    • D
      bpf: fix clearing on persistent program array maps · c9da161c
      Daniel Borkmann 提交于
      Currently, when having map file descriptors pointing to program arrays,
      there's still the issue that we unconditionally flush program array
      contents via bpf_fd_array_map_clear() in bpf_map_release(). This happens
      when such a file descriptor is released and is independent of the map's
      refcount.
      
      Having this flush independent of the refcount is for a reason: there
      can be arbitrary complex dependency chains among tail calls, also circular
      ones (direct or indirect, nesting limit determined during runtime), and
      we need to make sure that the map drops all references to eBPF programs
      it holds, so that the map's refcount can eventually drop to zero and
      initiate its freeing. Btw, a walk of the whole dependency graph would
      not be possible for various reasons, one being complexity and another
      one inconsistency, i.e. new programs can be added to parts of the graph
      at any time, so there's no guaranteed consistent state for the time of
      such a walk.
      
      Now, the program array pinning itself works, but the issue is that each
      derived file descriptor on close would nevertheless call unconditionally
      into bpf_fd_array_map_clear(). Instead, keep track of users and postpone
      this flush until the last reference to a user is dropped. As this only
      concerns a subset of references (f.e. a prog array could hold a program
      that itself has reference on the prog array holding it, etc), we need to
      track them separately.
      
      Short analysis on the refcounting: on map creation time usercnt will be
      one, so there's no change in behaviour for bpf_map_release(), if unpinned.
      If we already fail in map_create(), we are immediately freed, and no
      file descriptor has been made public yet. In bpf_obj_pin_user(), we need
      to probe for a possible map in bpf_fd_probe_obj() already with a usercnt
      reference, so before we drop the reference on the fd with fdput().
      Therefore, if actual pinning fails, we need to drop that reference again
      in bpf_any_put(), otherwise we keep holding it. When last reference
      drops on the inode, the bpf_any_put() in bpf_evict_inode() will take
      care of dropping the usercnt again. In the bpf_obj_get_user() case, the
      bpf_any_get() will grab a reference on the usercnt, still at a time when
      we have the reference on the path. Should we later on fail to grab a new
      file descriptor, bpf_any_put() will drop it, otherwise we hold it until
      bpf_map_release() time.
      
      Joint work with Alexei.
      
      Fixes: b2197755 ("bpf: add support for persistent maps/progs")
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c9da161c