1. 13 12月, 2013 6 次提交
  2. 12 12月, 2013 1 次提交
  3. 11 12月, 2013 6 次提交
    • P
      sched/fair: Rework sched_fair time accounting · 9dbdb155
      Peter Zijlstra 提交于
      Christian suffers from a bad BIOS that wrecks his i5's TSC sync. This
      results in him occasionally seeing time going backwards - which
      crashes the scheduler ...
      
      Most of our time accounting can actually handle that except the most
      common one; the tick time update of sched_fair.
      
      There is a further problem with that code; previously we assumed that
      because we get a tick every TICK_NSEC our time delta could never
      exceed 32bits and math was simpler.
      
      However, ever since Frederic managed to get NO_HZ_FULL merged; this is
      no longer the case since now a task can run for a long time indeed
      without getting a tick. It only takes about ~4.2 seconds to overflow
      our u32 in nanoseconds.
      
      This means we not only need to better deal with time going backwards;
      but also means we need to be able to deal with large deltas.
      
      This patch reworks the entire code and uses mul_u64_u32_shr() as
      proposed by Andy a long while ago.
      
      We express our virtual time scale factor in a u32 multiplier and shift
      right and the 32bit mul_u64_u32_shr() implementation reduces to a
      single 32x32->64 multiply if the time delta is still short (common
      case).
      
      For 64bit a 64x64->128 multiply can be used if ARCH_SUPPORTS_INT128.
      Reported-and-Tested-by: NChristian Engelmayer <cengelma@gmx.at>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: fweisbec@gmail.com
      Cc: Paul Turner <pjt@google.com>
      Cc: Stanislaw Gruszka <sgruszka@redhat.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Link: http://lkml.kernel.org/r/20131118172706.GI3866@twins.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      9dbdb155
    • P
      math64: Add mul_u64_u32_shr() · be5e610c
      Peter Zijlstra 提交于
      Introduce mul_u64_u32_shr() as proposed by Andy a while back; it
      allows using 64x64->128 muls on 64bit archs and recent GCC
      which defines __SIZEOF_INT128__ and __int128.
      
      (This new method will be used by the scheduler.)
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: fweisbec@gmail.com
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/n/tip-hxjoeuzmrcaumR0uZwjpe2pv@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      be5e610c
    • P
      sched: Remove PREEMPT_NEED_RESCHED from generic code · ba1f14fb
      Peter Zijlstra 提交于
      While hunting a preemption issue with Alexander, Ben noticed that the
      currently generic PREEMPT_NEED_RESCHED stuff is horribly broken for
      load-store architectures.
      
      We currently rely on the IPI to fold TIF_NEED_RESCHED into
      PREEMPT_NEED_RESCHED, but when this IPI lands while we already have
      a load for the preempt-count but before the store, the store will erase
      the PREEMPT_NEED_RESCHED change.
      
      The current preempt-count only works on load-store archs because
      interrupts are assumed to be completely balanced wrt their preempt_count
      fiddling; the previous preempt_count load will match the preempt_count
      state after the interrupt and therefore nothing gets lost.
      
      This patch removes the PREEMPT_NEED_RESCHED usage from generic code and
      pushes it into x86 arch code; the generic code goes back to relying on
      TIF_NEED_RESCHED.
      
      Boot tested on x86_64 and compile tested on ppc64.
      Reported-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Reported-and-Tested-by: NAlexander Graf <agraf@suse.de>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/20131128132641.GP10022@twins.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      ba1f14fb
    • N
      sctp: properly latch and use autoclose value from sock to association · 9f70f46b
      Neil Horman 提交于
      Currently, sctp associations latch a sockets autoclose value to an association
      at association init time, subject to capping constraints from the max_autoclose
      sysctl value.  This leads to an odd situation where an application may set a
      socket level autoclose timeout, but sliently sctp will limit the autoclose
      timeout to something less than that.
      
      Fix this by modifying the autoclose setsockopt function to check the limit, cap
      it and warn the user via syslog that the timeout is capped.  This will allow
      getsockopt to return valid autoclose timeout values that reflect what subsequent
      associations actually use.
      
      While were at it, also elimintate the assoc->autoclose variable, it duplicates
      whats in the timeout array, which leads to multiple sources for the same
      information, that may differ (as the former isn't subject to any capping).  This
      gives us the timeout information in a canonical place and saves some space in
      the association structure as well.
      Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
      Acked-by: NVlad Yasevich <vyasevich@gmail.com>
      CC: Wang Weidong <wangweidong1@huawei.com>
      CC: David Miller <davem@davemloft.net>
      CC: Vlad Yasevich <vyasevich@gmail.com>
      CC: netdev@vger.kernel.org
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9f70f46b
    • S
      net: unix: allow set_peek_off to fail · 12663bfc
      Sasha Levin 提交于
      unix_dgram_recvmsg() will hold the readlock of the socket until recv
      is complete.
      
      In the same time, we may try to setsockopt(SO_PEEK_OFF) which will hang until
      unix_dgram_recvmsg() will complete (which can take a while) without allowing
      us to break out of it, triggering a hung task spew.
      
      Instead, allow set_peek_off to fail, this way userspace will not hang.
      Signed-off-by: NSasha Levin <sasha.levin@oracle.com>
      Acked-by: NPavel Emelyanov <xemul@parallels.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      12663bfc
    • H
      x86, build, icc: Remove uninitialized_var() from compiler-intel.h · 503cf95c
      H. Peter Anvin 提交于
      When compiling with icc, <linux/compiler-gcc.h> ends up included
      because the icc environment defines __GNUC__.  Thus, we neither need
      nor want to have this macro defined in both compiler-gcc.h and
      compiler-intel.h, and the fact that they are inconsistent just makes
      the compiler spew warnings.
      Reported-by: NSunil K. Pandey <sunil.k.pandey@intel.com>
      Cc: Kevin B. Smith <kevin.b.smith@intel.com>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      Link: http://lkml.kernel.org/n/tip-0mbwou1zt7pafij09b897lg3@git.kernel.org
      Cc: <stable@vger.kernel.org>
      503cf95c
  4. 10 12月, 2013 3 次提交
    • T
      ALSA: compress: Fix 64bit ABI incompatibility · 6733cf57
      Takashi Iwai 提交于
      snd_pcm_uframes_t is defined as unsigned long so it would take
      different sizes depending on 32 or 64bit architectures.  As we don't
      want this ABI incompatibility, and there is no real 64bit user yet,
      let's make it the fixed size with __u32.
      
      Also bump the protocol version number to 0.1.2.
      Acked-by: NVinod Koul <vinod.koul@intel.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NTakashi Iwai <tiwai@suse.de>
      6733cf57
    • S
      ALSA: memalloc.h - fix wrong truncation of dma_addr_t · 932e9dec
      Stefano Panella 提交于
      When running a 32bit kernel the hda_intel driver is still reporting
      a 64bit dma_mask if the HW supports it.
      
      From sound/pci/hda/hda_intel.c:
      
              /* allow 64bit DMA address if supported by H/W */
              if ((gcap & ICH6_GCAP_64OK) && !pci_set_dma_mask(pci, DMA_BIT_MASK(64)))
                      pci_set_consistent_dma_mask(pci, DMA_BIT_MASK(64));
              else {
                      pci_set_dma_mask(pci, DMA_BIT_MASK(32));
                      pci_set_consistent_dma_mask(pci, DMA_BIT_MASK(32));
              }
      
      which means when there is a call to dma_alloc_coherent from
      snd_malloc_dev_pages a machine address bigger than 32bit can be returned.
      This can be true in particular if running  the 32bit kernel as a pv dom0
      under the Xen Hypervisor or PAE on bare metal.
      
      The problem is that when calling setup_bdle to program the BLE the
      dma_addr_t returned from the dma_alloc_coherent is wrongly truncated
      from snd_sgbuf_get_addr if running a 32bit kernel:
      
      static inline dma_addr_t snd_sgbuf_get_addr(struct snd_dma_buffer *dmab,
                                                 size_t offset)
      {
              struct snd_sg_buf *sgbuf = dmab->private_data;
              dma_addr_t addr = sgbuf->table[offset >> PAGE_SHIFT].addr;
              addr &= PAGE_MASK;
              return addr + offset % PAGE_SIZE;
      }
      
      where PAGE_MASK in a 32bit kernel is zeroing the upper 32bit af addr.
      
      Without this patch the HW will fetch the 32bit truncated address,
      which is not the one obtained from dma_alloc_coherent and will result
      to a non working audio but can corrupt host memory at a random location.
      
      The current patch apply to v3.13-rc3-74-g6c843f5
      Signed-off-by: NStefano Panella <stefano.panella@citrix.com>
      Reviewed-by: NFrediano Ziglio <frediano.ziglio@citrix.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NTakashi Iwai <tiwai@suse.de>
      932e9dec
    • P
      [media] videobuf2: Add support for file access mode flags for DMABUF exporting · c1b96a23
      Philipp Zabel 提交于
      Currently it is not possible for userspace to map a DMABUF exported buffer
      with write permissions. This patch allows to also pass O_RDONLY/O_RDWR when
      exporting the buffer, so that userspace may map it with write permissions.
      Signed-off-by: NPhilipp Zabel <p.zabel@pengutronix.de>
      Signed-off-by: NSylwester Nawrocki <s.nawrocki@samsung.com>
      Signed-off-by: NMauro Carvalho Chehab <m.chehab@samsung.com>
      c1b96a23
  5. 09 12月, 2013 1 次提交
  6. 08 12月, 2013 2 次提交
  7. 06 12月, 2013 3 次提交
  8. 03 12月, 2013 8 次提交
  9. 02 12月, 2013 2 次提交
    • E
      security: shmem: implement kernel private shmem inodes · c7277090
      Eric Paris 提交于
      We have a problem where the big_key key storage implementation uses a
      shmem backed inode to hold the key contents.  Because of this detail of
      implementation LSM checks are being done between processes trying to
      read the keys and the tmpfs backed inode.  The LSM checks are already
      being handled on the key interface level and should not be enforced at
      the inode level (since the inode is an implementation detail, not a
      part of the security model)
      
      This patch implements a new function shmem_kernel_file_setup() which
      returns the equivalent to shmem_file_setup() only the underlying inode
      has S_PRIVATE set.  This means that all LSM checks for the inode in
      question are skipped.  It should only be used for kernel internal
      operations where the inode is not exposed to userspace without proper
      LSM checking.  It is possible that some other users of
      shmem_file_setup() should use the new interface, but this has not been
      explored.
      
      Reproducing this bug is a little bit difficult.  The steps I used on
      Fedora are:
      
       (1) Turn off selinux enforcing:
      
      	setenforce 0
      
       (2) Create a huge key
      
      	k=`dd if=/dev/zero bs=8192 count=1 | keyctl padd big_key test-key @s`
      
       (3) Access the key in another context:
      
      	runcon system_u:system_r:httpd_t:s0-s0:c0.c1023 keyctl print $k >/dev/null
      
       (4) Examine the audit logs:
      
      	ausearch -m AVC -i --subject httpd_t | audit2allow
      
      If the last command's output includes a line that looks like:
      
      	allow httpd_t user_tmpfs_t:file { open read };
      
      There was an inode check between httpd and the tmpfs filesystem.  With
      this patch no such denial will be seen.  (NOTE! you should clear your
      audit log if you have tested for this previously)
      
      (Please return you box to enforcing)
      Signed-off-by: NEric Paris <eparis@redhat.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      cc: Hugh Dickins <hughd@google.com>
      cc: linux-mm@kvack.org
      c7277090
    • D
      KEYS: Fix multiple key add into associative array · 23fd78d7
      David Howells 提交于
      If sufficient keys (or keyrings) are added into a keyring such that a node in
      the associative array's tree overflows (each node has a capacity N, currently
      16) and such that all N+1 keys have the same index key segment for that level
      of the tree (the level'th nibble of the index key), then assoc_array_insert()
      calls ops->diff_objects() to indicate at which bit position the two index keys
      vary.
      
      However, __key_link_begin() passes a NULL object to assoc_array_insert() with
      the intention of supplying the correct pointer later before we commit the
      change.  This means that keyring_diff_objects() is given a NULL pointer as one
      of its arguments which it does not expect.  This results in an oops like the
      attached.
      
      With the previous patch to fix the keyring hash function, this can be forced
      much more easily by creating a keyring and only adding keyrings to it.  Add any
      other sort of key and a different insertion path is taken - all 16+1 objects
      must want to cluster in the same node slot.
      
      This can be tested by:
      
      	r=`keyctl newring sandbox @s`
      	for ((i=0; i<=16; i++)); do keyctl newring ring$i $r; done
      
      This should work fine, but oopses when the 17th keyring is added.
      
      Since ops->diff_objects() is always called with the first pointer pointing to
      the object to be inserted (ie. the NULL pointer), we can fix the problem by
      changing the to-be-inserted object pointer to point to the index key passed
      into assoc_array_insert() instead.
      
      Whilst we're at it, we also switch the arguments so that they are the same as
      for ->compare_object().
      
      BUG: unable to handle kernel NULL pointer dereference at 0000000000000088
      IP: [<ffffffff81191ee4>] hash_key_type_and_desc+0x18/0xb0
      ...
      RIP: 0010:[<ffffffff81191ee4>] hash_key_type_and_desc+0x18/0xb0
      ...
      Call Trace:
       [<ffffffff81191f9d>] keyring_diff_objects+0x21/0xd2
       [<ffffffff811f09ef>] assoc_array_insert+0x3b6/0x908
       [<ffffffff811929a7>] __key_link_begin+0x78/0xe5
       [<ffffffff81191a2e>] key_create_or_update+0x17d/0x36a
       [<ffffffff81192e0a>] SyS_add_key+0x123/0x183
       [<ffffffff81400ddb>] tracesys+0xdd/0xe2
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Tested-by: NStephen Gallagher <sgallagh@redhat.com>
      23fd78d7
  10. 01 12月, 2013 1 次提交
  11. 29 11月, 2013 5 次提交
    • M
      [SCSI] Disable WRITE SAME for RAID and virtual host adapter drivers · 54b2b50c
      Martin K. Petersen 提交于
      Some host adapters do not pass commands through to the target disk
      directly. Instead they provide an emulated target which may or may not
      accurately report its capabilities. In some cases the physical device
      characteristics are reported even when the host adapter is processing
      commands on the device's behalf. This can lead to adapter firmware hangs
      or excessive I/O errors.
      
      This patch disables WRITE SAME for devices connected to host adapters
      that provide an emulated target. Driver writers can disable WRITE SAME
      by setting the no_write_same flag in the host adapter template.
      
      [jejb: fix up rejections due to eh_deadline patch]
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      Cc: stable@kernel.org
      Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
      54b2b50c
    • X
      sctp: Restore 'resent' bit to avoid retransmitted chunks for RTT measurements · 6eabca54
      Xufeng Zhang 提交于
      Currently retransmitted DATA chunks could also be used for
      RTT measurements since there are no flag to identify whether
      the transmitted DATA chunk is a new one or a retransmitted one.
      This problem is introduced by commit ae19c548 ("sctp: remove
      'resent' bit from the chunk") which inappropriately removed the
      'resent' bit completely, instead of doing this, we should set
      the resent bit only for the retransmitted DATA chunks.
      Signed-off-by: NXufeng Zhang <xufeng.zhang@windriver.com>
      Acked-by: NVlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6eabca54
    • J
      genetlink/pmcraid: use proper genetlink multicast API · 5e53e689
      Johannes Berg 提交于
      The pmcraid driver is abusing the genetlink API and is using its
      family ID as the multicast group ID, which is invalid and may
      belong to somebody else (and likely will.)
      
      Make it use the correct API, but since this may already be used
      as-is by userspace, reserve a family ID for this code and also
      reserve that group ID to not break userspace assumptions.
      
      My previous patch broke event delivery in the driver as I missed
      that it wasn't using the right API and forgot to update it later
      in my series.
      
      While changing this, I noticed that the genetlink code could use
      the static group ID instead of a strcmp(), so also do that for
      the VFS_DQUOT family.
      
      Cc: Anil Ravindranath <anil_ravindranath@pmc-sierra.com>
      Cc: "James E.J. Bottomley" <JBottomley@parallels.com>
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5e53e689
    • N
      diag: warn about missing first netlink attribute · 31e20bad
      Nicolas Dichtel 提交于
      The first netlink attribute (value 0) must always be defined as none/unspec.
      This is correctly done in inet_diag.h, but other diag interfaces are wrong.
      
      Because we cannot change an existing API, I add a comment to point the mistake
      and avoid to propagate it in a new diag API in the future.
      
      CC: Thomas Graf <tgraf@suug.ch>
      Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Acked-by: NThomas Graf <tgraf@suug.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      31e20bad
    • S
      efivars, efi-pstore: Hold off deletion of sysfs entry until the scan is completed · e0d59733
      Seiji Aguchi 提交于
      Currently, when mounting pstore file system, a read callback of
      efi_pstore driver runs mutiple times as below.
      
      - In the first read callback, scan efivar_sysfs_list from head and pass
        a kmsg buffer of a entry to an upper pstore layer.
      - In the second read callback, rescan efivar_sysfs_list from the entry
        and pass another kmsg buffer to it.
      - Repeat the scan and pass until the end of efivar_sysfs_list.
      
      In this process, an entry is read across the multiple read function
      calls. To avoid race between the read and erasion, the whole process
      above is protected by a spinlock, holding in open() and releasing in
      close().
      
      At the same time, kmemdup() is called to pass the buffer to pstore
      filesystem during it. And then, it causes a following lockdep warning.
      
      To make the dynamic memory allocation runnable without taking spinlock,
      holding off a deletion of sysfs entry if it happens while scanning it
      via efi_pstore, and deleting it after the scan is completed.
      
      To implement it, this patch introduces two flags, scanning and deleting,
      to efivar_entry.
      
      On the code basis, it seems that all the scanning and deleting logic is
      not needed because __efivars->lock are not dropped when reading from the
      EFI variable store.
      
      But, the scanning and deleting logic is still needed because an
      efi-pstore and a pstore filesystem works as follows.
      
      In case an entry(A) is found, the pointer is saved to psi->data.  And
      efi_pstore_read() passes the entry(A) to a pstore filesystem by
      releasing  __efivars->lock.
      
      And then, the pstore filesystem calls efi_pstore_read() again and the
      same entry(A), which is saved to psi->data, is used for resuming to scan
      a sysfs-list.
      
      So, to protect the entry(A), the logic is needed.
      
      [    1.143710] ------------[ cut here ]------------
      [    1.144058] WARNING: CPU: 1 PID: 1 at kernel/lockdep.c:2740 lockdep_trace_alloc+0x104/0x110()
      [    1.144058] DEBUG_LOCKS_WARN_ON(irqs_disabled_flags(flags))
      [    1.144058] Modules linked in:
      [    1.144058] CPU: 1 PID: 1 Comm: systemd Not tainted 3.11.0-rc5 #2
      [    1.144058]  0000000000000009 ffff8800797e9ae0 ffffffff816614a5 ffff8800797e9b28
      [    1.144058]  ffff8800797e9b18 ffffffff8105510d 0000000000000080 0000000000000046
      [    1.144058]  00000000000000d0 00000000000003af ffffffff81ccd0c0 ffff8800797e9b78
      [    1.144058] Call Trace:
      [    1.144058]  [<ffffffff816614a5>] dump_stack+0x54/0x74
      [    1.144058]  [<ffffffff8105510d>] warn_slowpath_common+0x7d/0xa0
      [    1.144058]  [<ffffffff8105517c>] warn_slowpath_fmt+0x4c/0x50
      [    1.144058]  [<ffffffff8131290f>] ? vsscanf+0x57f/0x7b0
      [    1.144058]  [<ffffffff810bbd74>] lockdep_trace_alloc+0x104/0x110
      [    1.144058]  [<ffffffff81192da0>] __kmalloc_track_caller+0x50/0x280
      [    1.144058]  [<ffffffff815147bb>] ? efi_pstore_read_func.part.1+0x12b/0x170
      [    1.144058]  [<ffffffff8115b260>] kmemdup+0x20/0x50
      [    1.144058]  [<ffffffff815147bb>] efi_pstore_read_func.part.1+0x12b/0x170
      [    1.144058]  [<ffffffff81514800>] ? efi_pstore_read_func.part.1+0x170/0x170
      [    1.144058]  [<ffffffff815148b4>] efi_pstore_read_func+0xb4/0xe0
      [    1.144058]  [<ffffffff81512b7b>] __efivar_entry_iter+0xfb/0x120
      [    1.144058]  [<ffffffff8151428f>] efi_pstore_read+0x3f/0x50
      [    1.144058]  [<ffffffff8128d7ba>] pstore_get_records+0x9a/0x150
      [    1.158207]  [<ffffffff812af25c>] ? selinux_d_instantiate+0x1c/0x20
      [    1.158207]  [<ffffffff8128ce30>] ? parse_options+0x80/0x80
      [    1.158207]  [<ffffffff8128ced5>] pstore_fill_super+0xa5/0xc0
      [    1.158207]  [<ffffffff811ae7d2>] mount_single+0xa2/0xd0
      [    1.158207]  [<ffffffff8128ccf8>] pstore_mount+0x18/0x20
      [    1.158207]  [<ffffffff811ae8b9>] mount_fs+0x39/0x1b0
      [    1.158207]  [<ffffffff81160550>] ? __alloc_percpu+0x10/0x20
      [    1.158207]  [<ffffffff811c9493>] vfs_kern_mount+0x63/0xf0
      [    1.158207]  [<ffffffff811cbb0e>] do_mount+0x23e/0xa20
      [    1.158207]  [<ffffffff8115b51b>] ? strndup_user+0x4b/0xf0
      [    1.158207]  [<ffffffff811cc373>] SyS_mount+0x83/0xc0
      [    1.158207]  [<ffffffff81673cc2>] system_call_fastpath+0x16/0x1b
      [    1.158207] ---[ end trace 61981bc62de9f6f4 ]---
      Signed-off-by: NSeiji Aguchi <seiji.aguchi@hds.com>
      Tested-by: NMadper Xie <cxie@redhat.com>
      Cc: stable@kernel.org
      Signed-off-by: NMatt Fleming <matt.fleming@intel.com>
      e0d59733
  12. 28 11月, 2013 2 次提交