1. 21 1月, 2015 6 次提交
  2. 19 1月, 2015 1 次提交
  3. 17 1月, 2015 2 次提交
    • J
      genetlink: synchronize socket closing and family removal · ee1c2442
      Johannes Berg 提交于
      In addition to the problem Jeff Layton reported, I looked at the code
      and reproduced the same warning by subscribing and removing the genl
      family with a socket still open. This is a fairly tricky race which
      originates in the fact that generic netlink allows the family to go
      away while sockets are still open - unlike regular netlink which has
      a module refcount for every open socket so in general this cannot be
      triggered.
      
      Trying to resolve this issue by the obvious locking isn't possible as
      it will result in deadlocks between unregistration and group unbind
      notification (which incidentally lockdep doesn't find due to the home
      grown locking in the netlink table.)
      
      To really resolve this, introduce a "closing socket" reference counter
      (for generic netlink only, as it's the only affected family) in the
      core netlink code and use that in generic netlink to wait for all the
      sockets that are being closed at the same time as a generic netlink
      family is removed.
      
      This fixes the race that when a socket is closed, it will should call
      the unbind, but if the family is removed at the same time the unbind
      will not find it, leading to the warning. The real problem though is
      that in this case the unbind could actually find a new family that is
      registered to have a multicast group with the same ID, and call its
      mcast_unbind() leading to confusing.
      
      Also remove the warning since it would still trigger, but is now no
      longer a problem.
      
      This also moves the code in af_netlink.c to before unreferencing the
      module to avoid having the same problem in the normal non-genl case.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ee1c2442
    • J
      genetlink: document parallel_ops · f555f3d7
      Johannes Berg 提交于
      The kernel-doc for the parallel_ops family struct member is
      missing, add it.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f555f3d7
  4. 15 1月, 2015 3 次提交
  5. 14 1月, 2015 2 次提交
  6. 13 1月, 2015 2 次提交
    • J
      x86/xen: properly retrieve NMI reason · f221b04f
      Jan Beulich 提交于
      Using the native code here can't work properly, as the hypervisor would
      normally have cleared the two reason bits by the time Dom0 gets to see
      the NMI (if passed to it at all). There's a shared info field for this,
      and there's an existing hook to use - just fit the two together. This
      is particularly relevant so that NMIs intended to be handled by APEI /
      GHES actually make it to the respective handler.
      
      Note that the hook can (and should) be used irrespective of whether
      being in Dom0, as accessing port 0x61 in a DomU would be even worse,
      while the shared info field would just hold zero all the time. Note
      further that hardware NMI handling for PVH doesn't currently work
      anyway due to missing code in the hypervisor (but it is expected to
      work the native rather than the PV way).
      Signed-off-by: NJan Beulich <jbeulich@suse.com>
      Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      f221b04f
    • W
      mm: mmu_gather: use tlb->end != 0 only for TLB invalidation · 721c21c1
      Will Deacon 提交于
      When batching up address ranges for TLB invalidation, we check tlb->end
      != 0 to indicate that some pages have actually been unmapped.
      
      As of commit f045bbb9 ("mmu_gather: fix over-eager
      tlb_flush_mmu_free() calling"), we use the same check for freeing these
      pages in order to avoid a performance regression where we call
      free_pages_and_swap_cache even when no pages are actually queued up.
      
      Unfortunately, the range could have been reset (tlb->end = 0) by
      tlb_end_vma, which has been shown to cause memory leaks on arm64.
      Furthermore, investigation into these leaks revealed that the fullmm
      case on task exit no longer invalidates the TLB, by virtue of tlb->end
       == 0 (in 3.18, need_flush would have been set).
      
      This patch resolves the problem by reverting commit f045bbb9, using
      instead tlb->local.nr as the predicate for page freeing in
      tlb_flush_mmu_free and ensuring that tlb->end is initialised to a
      non-zero value in the fullmm case.
      Tested-by: NMark Langsdorf <mlangsdo@redhat.com>
      Tested-by: NDave Hansen <dave@sr71.net>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      721c21c1
  7. 12 1月, 2015 2 次提交
  8. 10 1月, 2015 1 次提交
  9. 09 1月, 2015 5 次提交
  10. 08 1月, 2015 6 次提交
    • K
      blk-mq: Allow requests to never expire · 5b3f25fc
      Keith Busch 提交于
      Some types of requests may be started that are not gauranteed to ever
      complete. This adds a request flag that a driver can use so mark the
      request as such.
      Signed-off-by: NKeith Busch <keith.busch@intel.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      5b3f25fc
    • J
      blk-mq: Add helper to abort requeued requests · 1885b24d
      Jens Axboe 提交于
      Adds a helper function a driver can use to abort requeued requests in
      case any are pending when h/w queues are being removed.
      Signed-off-by: NJens Axboe <axboe@fb.com>
      1885b24d
    • K
      blk-mq: Let drivers cancel requeue_work · c68ed59f
      Keith Busch 提交于
      Kicking requeued requests will start h/w queues in a work_queue, which
      may alter the driver's requested state to temporarily stop them. This
      patch exports a method to cancel the q->requeue_work so a driver can be
      assured stopped h/w queues won't be started up before it is ready.
      Signed-off-by: NKeith Busch <keith.busch@intel.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      c68ed59f
    • K
      blk-mq: Export if requests were started · 973c0191
      Keith Busch 提交于
      Drivers can iterate over all allocated request tags, but their callback
      needs a way to know if the driver started the request in the first place.
      Signed-off-by: NKeith Busch <keith.busch@intel.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      973c0191
    • M
      libata: Whitelist SSDs that are known to properly return zeroes after TRIM · e61f7d1c
      Martin K. Petersen 提交于
      As defined, the DRAT (Deterministic Read After Trim) and RZAT (Return
      Zero After Trim) flags in the ATA Command Set are unreliable in the
      sense that they only define what happens if the device successfully
      executed the DSM TRIM command. TRIM is only advisory, however, and the
      device is free to silently ignore all or parts of the request.
      
      In practice this renders the DRAT and RZAT flags completely useless and
      because the results are unpredictable we decided to disable discard in
      MD for 3.18 to avoid the risk of data corruption.
      
      Hardware vendors in the real world obviously need better guarantees than
      what the standards bodies provide. Unfortuntely those guarantees are
      encoded in product requirements documents rather than somewhere we can
      key off of them programatically. So we are compelled to disabling
      discard_zeroes_data for all devices unless we explicitly have data to
      support whitelisting them.
      
      This patch whitelists SSDs from a few of the main vendors. None of the
      whitelists are based on written guarantees. They are purely based on
      empirical evidence collected from internal and external users that have
      tested or qualified these drives in RAID deployments.
      
      The whitelist is only meant as a starting point and is by no means
      comprehensive:
      
         - All intel SSD models except for 510
         - Micron M5?0/M600
         - Samsung SSDs
         - Seagate SSDs
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      e61f7d1c
    • J
      blk-mq: get rid of ->cmd_size in the hardware queue · 17ded320
      Jens Axboe 提交于
      We store it in the tag set, we don't need it in the hardware queue.
      While removing cmd_size, place ->queue_num further down to avoid
      a hole on 64-bit archs. It's not used in any fast paths, so we
      can safely move it.
      Signed-off-by: NJens Axboe <axboe@fb.com>
      17ded320
  11. 07 1月, 2015 2 次提交
    • L
      mm: propagate error from stack expansion even for guard page · fee7e49d
      Linus Torvalds 提交于
      Jay Foad reports that the address sanitizer test (asan) sometimes gets
      confused by a stack pointer that ends up being outside the stack vma
      that is reported by /proc/maps.
      
      This happens due to an interaction between RLIMIT_STACK and the guard
      page: when we do the guard page check, we ignore the potential error
      from the stack expansion, which effectively results in a missing guard
      page, since the expected stack expansion won't have been done.
      
      And since /proc/maps explicitly ignores the guard page (commit
      d7824370: "mm: fix up some user-visible effects of the stack guard
      page"), the stack pointer ends up being outside the reported stack area.
      
      This is the minimal patch: it just propagates the error.  It also
      effectively makes the guard page part of the stack limit, which in turn
      measn that the actual real stack is one page less than the stack limit.
      
      Let's see if anybody notices.  We could teach acct_stack_growth() to
      allow an extra page for a grow-up/grow-down stack in the rlimit test,
      but I don't want to add more complexity if it isn't needed.
      Reported-and-tested-by: NJay Foad <jay.foad@gmail.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fee7e49d
    • O
      drm/amdkfd: reformat IOCTL definitions to drm-style · b81c55db
      Oded Gabbay 提交于
      This patch reformats the ioctl definitions in kfd_ioctl.h to be similar to the
      drm ioctls definition style.
      
      v2: Renamed KFD_COMMAND_(START|END) to AMDKFD_...
      Signed-off-by: NOded Gabbay <oded.gabbay@amd.com>
      Acked-by: NChristian König <christian.koenig@amd.com>
      b81c55db
  12. 06 1月, 2015 3 次提交
  13. 05 1月, 2015 1 次提交
  14. 30 12月, 2014 2 次提交
    • L
      ALSA: pcm: Fix kerneldoc for params_*() functions · 62f64a88
      Lars-Peter Clausen 提交于
      Fix a copy and paste error in the kernel doc description for the params_*()
      functions.
      Signed-off-by: NLars-Peter Clausen <lars@metafoo.de>
      Signed-off-by: NTakashi Iwai <tiwai@suse.de>
      62f64a88
    • M
      mm: get rid of radix tree gfp mask for pagecache_get_page · 45f87de5
      Michal Hocko 提交于
      Commit 2457aec6 ("mm: non-atomically mark page accessed during page
      cache allocation where possible") has added a separate parameter for
      specifying gfp mask for radix tree allocations.
      
      Not only this is less than optimal from the API point of view because it
      is error prone, it is also buggy currently because
      grab_cache_page_write_begin is using GFP_KERNEL for radix tree and if
      fgp_flags doesn't contain FGP_NOFS (mostly controlled by fs by
      AOP_FLAG_NOFS flag) but the mapping_gfp_mask has __GFP_FS cleared then
      the radix tree allocation wouldn't obey the restriction and might
      recurse into filesystem and cause deadlocks.  This is the case for most
      filesystems unfortunately because only ext4 and gfs2 are using
      AOP_FLAG_NOFS.
      
      Let's simply remove radix_gfp_mask parameter because the allocation
      context is same for both page cache and for the radix tree.  Just make
      sure that the radix tree gets only the sane subset of the mask (e.g.  do
      not pass __GFP_WRITE).
      
      Long term it is more preferable to convert remaining users of
      AOP_FLAG_NOFS to use mapping_gfp_mask instead and simplify this
      interface even further.
      Reported-by: NDave Chinner <david@fromorbit.com>
      Signed-off-by: NMichal Hocko <mhocko@suse.cz>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      45f87de5
  15. 29 12月, 2014 1 次提交
  16. 27 12月, 2014 1 次提交
    • J
      netlink/genetlink: pass network namespace to bind/unbind · 023e2cfa
      Johannes Berg 提交于
      Netlink families can exist in multiple namespaces, and for the most
      part multicast subscriptions are per network namespace. Thus it only
      makes sense to have bind/unbind notifications per network namespace.
      
      To achieve this, pass the network namespace of a given client socket
      to the bind/unbind functions.
      
      Also do this in generic netlink, and there also make sure that any
      bind for multicast groups that only exist in init_net is rejected.
      This isn't really a problem if it is accepted since a client in a
      different namespace will never receive any notifications from such
      a group, but it can confuse the family if not rejected (it's also
      possible to silently (without telling the family) accept it, but it
      would also have to be ignored on unbind so families that take any
      kind of action on bind/unbind won't do unnecessary work for invalid
      clients like that.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      023e2cfa