1. 27 3月, 2014 3 次提交
  2. 25 3月, 2014 1 次提交
  3. 22 3月, 2014 1 次提交
  4. 21 3月, 2014 6 次提交
    • H
      mm: fix swapops.h:131 bug if remap_file_pages raced migration · 7e09e738
      Hugh Dickins 提交于
      Add remove_linear_migration_ptes_from_nonlinear(), to fix an interesting
      little include/linux/swapops.h:131 BUG_ON(!PageLocked) found by trinity:
      indicating that remove_migration_ptes() failed to find one of the
      migration entries that was temporarily inserted.
      
      The problem comes from remap_file_pages()'s switch from vma_interval_tree
      (good for inserting the migration entry) to i_mmap_nonlinear list (no good
      for locating it again); but can only be a problem if the remap_file_pages()
      range does not cover the whole of the vma (zap_pte() clears the range).
      
      remove_migration_ptes() needs a file_nonlinear method to go down the
      i_mmap_nonlinear list, applying linear location to look for migration
      entries in those vmas too, just in case there was this race.
      
      The file_nonlinear method does need rmap_walk_control.arg to do this;
      but it never needed vma passed in - vma comes from its own iteration.
      Reported-and-tested-by: NDave Jones <davej@redhat.com>
      Reported-and-tested-by: NSasha Levin <sasha.levin@oracle.com>
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7e09e738
    • B
      net: cdc_ncm: respect operator preferred MTU reported by MBIM · 259fef03
      Ben Chan 提交于
      According to "Universal Serial Bus Communications Class Subclass
      Specification for Mobile Broadband Interface Model, Revision 1.0,
      Errata-1" published by USB-IF, the wMTU field of the MBIM extended
      functional descriptor indicates the operator preferred MTU for IP data
      streams.
      
      This patch modifies cdc_ncm_setup to ensure that the MTU value set on
      the usbnet device does not exceed the operator preferred MTU indicated
      by wMTU if the MBIM device exposes a MBIM extended functional
      descriptor.
      Signed-off-by: NBen Chan <benchan@chromium.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      259fef03
    • M
      net/mlx4: Adapt code for N-Port VF · 449fc488
      Matan Barak 提交于
      Adds support for N-Port VFs, this includes:
      1. Adding support in the wrapped FW command
      	In wrapped commands, we need to verify and convert
      	the slave's port into the real physical port.
      	Furthermore, when sending the response back to the slave,
      	a reverse conversion should be made.
      2. Adjusting sqpn for QP1 para-virtualization
      	The slave assumes that sqpn is used for QP1 communication.
      	If the slave is assigned to a port != (first port), we need
      	to adjust the sqpn that will direct its QP1 packets into the
      	correct endpoint.
      3. Adjusting gid[5] to modify the port for raw ethernet
      	In B0 steering, gid[5] contains the port. It needs
      	to be adjusted into the physical port.
      4. Adjusting number of ports in the query / ports caps in the FW commands
      	When a slave queries the hardware, it needs to view only
      	the physical ports it's assigned to.
      5. Adjusting the sched_qp according to the port number
      	The QP port is encoded in the sched_qp, thus in modify_qp we need
      	to encode the correct port in sched_qp.
      Signed-off-by: NMatan Barak <matanb@mellanox.com>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      449fc488
    • M
      net/mlx4: Add utils for N-Port VFs · f74462ac
      Matan Barak 提交于
      This patch adds the following utils:
      1. Convert slave_id -> VF
      2. Get the active ports by slave_id
      3. Convert slave's port to real port
      4. Get the slave's port from real port
      5. Get all slaves that uses the i'th real port
      6. Get all slaves that uses the i'th real port exclusively
      Signed-off-by: NMatan Barak <matanb@mellanox.com>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f74462ac
    • M
      net/mlx4: Add data structures to support N-Ports per VF · 1ab95d37
      Matan Barak 提交于
      Adds the required data structures to support VFs with N (1 or 2)
      ports instead of always using the number of physical ports.
      Signed-off-by: NMatan Barak <matanb@mellanox.com>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1ab95d37
    • V
      tracing: Fix array size mismatch in format string · 87291347
      Vaibhav Nagarnaik 提交于
      In event format strings, the array size is reported in two locations.
      One in array subscript and then via the "size:" attribute. The values
      reported there have a mismatch.
      
      For e.g., in sched:sched_switch the prev_comm and next_comm character
      arrays have subscript values as [32] where as the actual field size is
      16.
      
      name: sched_switch
      ID: 301
      format:
              field:unsigned short common_type;       offset:0;       size:2; signed:0;
              field:unsigned char common_flags;       offset:2;       size:1; signed:0;
              field:unsigned char common_preempt_count;       offset:3;       size:1;signed:0;
              field:int common_pid;   offset:4;       size:4; signed:1;
      
              field:char prev_comm[32];       offset:8;       size:16;        signed:1;
              field:pid_t prev_pid;   offset:24;      size:4; signed:1;
              field:int prev_prio;    offset:28;      size:4; signed:1;
              field:long prev_state;  offset:32;      size:8; signed:1;
              field:char next_comm[32];       offset:40;      size:16;        signed:1;
              field:pid_t next_pid;   offset:56;      size:4; signed:1;
              field:int next_prio;    offset:60;      size:4; signed:1;
      
      After bisection, the following commit was blamed:
      92edca07 tracing: Use direct field, type and system names
      
      This commit removes the duplication of strings for field->name and
      field->type assuming that all the strings passed in
      __trace_define_field() are immutable. This is not true for arrays, where
      the type string is created in event_storage variable and field->type for
      all array fields points to event_storage.
      
      Use __stringify() to create a string constant for the type string.
      
      Also, get rid of event_storage and event_storage_mutex that are not
      needed anymore.
      
      also, an added benefit is that this reduces the overhead of events a bit more:
      
         text    data     bss     dec     hex filename
      8424787 2036472 1302528 11763787         b3804b vmlinux
      8420814 2036408 1302528 11759750         b37086 vmlinux.patched
      
      Link: http://lkml.kernel.org/r/1392349908-29685-1-git-send-email-vnagarnaik@google.com
      
      Cc: Laurent Chavey <chavey@google.com>
      Cc: stable@vger.kernel.org # 3.10+
      Signed-off-by: NVaibhav Nagarnaik <vnagarnaik@google.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      87291347
  5. 20 3月, 2014 1 次提交
  6. 19 3月, 2014 1 次提交
  7. 18 3月, 2014 6 次提交
  8. 15 3月, 2014 1 次提交
  9. 13 3月, 2014 4 次提交
  10. 12 3月, 2014 1 次提交
  11. 11 3月, 2014 1 次提交
    • J
      mm: fix GFP_THISNODE callers and clarify · e97ca8e5
      Johannes Weiner 提交于
      GFP_THISNODE is for callers that implement their own clever fallback to
      remote nodes.  It restricts the allocation to the specified node and
      does not invoke reclaim, assuming that the caller will take care of it
      when the fallback fails, e.g.  through a subsequent allocation request
      without GFP_THISNODE set.
      
      However, many current GFP_THISNODE users only want the node exclusive
      aspect of the flag, without actually implementing their own fallback or
      triggering reclaim if necessary.  This results in things like page
      migration failing prematurely even when there is easily reclaimable
      memory available, unless kswapd happens to be running already or a
      concurrent allocation attempt triggers the necessary reclaim.
      
      Convert all callsites that don't implement their own fallback strategy
      to __GFP_THISNODE.  This restricts the allocation a single node too, but
      at the same time allows the allocator to enter the slowpath, wake
      kswapd, and invoke direct reclaim if necessary, to make the allocation
      happen when memory is full.
      Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: NRik van Riel <riel@redhat.com>
      Cc: Jan Stancek <jstancek@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e97ca8e5
  12. 10 3月, 2014 3 次提交
    • A
      get rid of fget_light() · bd2a31d5
      Al Viro 提交于
      instead of returning the flags by reference, we can just have the
      low-level primitive return those in lower bits of unsigned long,
      with struct file * derived from the rest.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      bd2a31d5
    • L
      vfs: atomic f_pos accesses as per POSIX · 9c225f26
      Linus Torvalds 提交于
      Our write() system call has always been atomic in the sense that you get
      the expected thread-safe contiguous write, but we haven't actually
      guaranteed that concurrent writes are serialized wrt f_pos accesses, so
      threads (or processes) that share a file descriptor and use "write()"
      concurrently would quite likely overwrite each others data.
      
      This violates POSIX.1-2008/SUSv4 Section XSI 2.9.7 that says:
      
       "2.9.7 Thread Interactions with Regular File Operations
      
        All of the following functions shall be atomic with respect to each
        other in the effects specified in POSIX.1-2008 when they operate on
        regular files or symbolic links: [...]"
      
      and one of the effects is the file position update.
      
      This unprotected file position behavior is not new behavior, and nobody
      has ever cared.  Until now.  Yongzhi Pan reported unexpected behavior to
      Michael Kerrisk that was due to this.
      
      This resolves the issue with a f_pos-specific lock that is taken by
      read/write/lseek on file descriptors that may be shared across threads
      or processes.
      Reported-by: NYongzhi Pan <panyongzhi@gmail.com>
      Reported-by: NMichael Kerrisk <mtk.manpages@gmail.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      9c225f26
    • N
      selinux: add gfp argument to security_xfrm_policy_alloc and fix callers · 52a4c640
      Nikolay Aleksandrov 提交于
      security_xfrm_policy_alloc can be called in atomic context so the
      allocation should be done with GFP_ATOMIC. Add an argument to let the
      callers choose the appropriate way. In order to do so a gfp argument
      needs to be added to the method xfrm_policy_alloc_security in struct
      security_operations and to the internal function
      selinux_xfrm_alloc_user. After that switch to GFP_ATOMIC in the atomic
      callers and leave GFP_KERNEL as before for the rest.
      The path that needed the gfp argument addition is:
      security_xfrm_policy_alloc -> security_ops.xfrm_policy_alloc_security ->
      all users of xfrm_policy_alloc_security (e.g. selinux_xfrm_policy_alloc) ->
      selinux_xfrm_alloc_user (here the allocation used to be GFP_KERNEL only)
      
      Now adding a gfp argument to selinux_xfrm_alloc_user requires us to also
      add it to security_context_to_sid which is used inside and prior to this
      patch did only GFP_KERNEL allocation. So add gfp argument to
      security_context_to_sid and adjust all of its callers as well.
      
      CC: Paul Moore <paul@paul-moore.com>
      CC: Dave Jones <davej@redhat.com>
      CC: Steffen Klassert <steffen.klassert@secunet.com>
      CC: Fan Du <fan.du@windriver.com>
      CC: David S. Miller <davem@davemloft.net>
      CC: LSM list <linux-security-module@vger.kernel.org>
      CC: SELinux list <selinux@tycho.nsa.gov>
      Signed-off-by: NNikolay Aleksandrov <nikolay@redhat.com>
      Acked-by: NPaul Moore <paul@paul-moore.com>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      52a4c640
  13. 07 3月, 2014 3 次提交
  14. 06 3月, 2014 3 次提交
  15. 05 3月, 2014 1 次提交
  16. 04 3月, 2014 4 次提交
    • L
      mm: numa: bugfix for LAST_CPUPID_NOT_IN_PAGE_FLAGS · 1ae71d03
      Liu Ping Fan 提交于
      When doing some numa tests on powerpc, I triggered an oops bug.  I find
      it is caused by using page->_last_cpupid.  It should be initialized as
      "-1 & LAST_CPUPID_MASK", but not "-1".  Otherwise, in task_numa_fault(),
      we will miss the checking (last_cpupid == (-1 & LAST_CPUPID_MASK)).  And
      finally cause an oops bug in task_numa_group(), since the online cpu is
      less than possible cpu.  This happen with CONFIG_SPARSE_VMEMMAP disabled
      
      Call trace:
      
        SMP NR_CPUS=64 NUMA PowerNV
        Modules linked in:
        CPU: 24 PID: 804 Comm: systemd-udevd Not tainted3.13.0-rc1+ #32
        task: c000001e2746aa80 ti: c000001e32c50000 task.ti:c000001e32c50000
        REGS: c000001e32c53510 TRAP: 0300   Not tainted(3.13.0-rc1+)
        MSR: 9000000000009032 <SF,HV,EE,ME,IR,DR,RI>  CR:28024424  XER: 20000000
        CFAR: c000000000009324 DAR: 7265717569726857 DSISR:40000000 SOFTE: 1
        NIP  .task_numa_fault+0x1470/0x2370
        LR  .task_numa_fault+0x1468/0x2370
        Call Trace:
         .task_numa_fault+0x1468/0x2370 (unreliable)
         .do_numa_page+0x480/0x4a0
         .handle_mm_fault+0x4ec/0xc90
         .do_page_fault+0x3a8/0x890
         handle_page_fault+0x10/0x30
        Instruction dump:
        3c82fefb 3884b138 48d9cff1 60000000 48000574 3c62fefb3863af78 3c82fefb
        3884b138 48d9cfd5 60000000 e93f0100 <812902e4> 7d2907b45529063e 7d2a07b4
        ---[ end trace 15f2510da5ae07cf ]---
      Signed-off-by: NLiu Ping Fan <pingfank@linux.vnet.ibm.com>
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1ae71d03
    • V
      mm: include VM_MIXEDMAP flag in the VM_SPECIAL list to avoid m(un)locking · 9050d7eb
      Vlastimil Babka 提交于
      Daniel Borkmann reported a VM_BUG_ON assertion failing:
      
        ------------[ cut here ]------------
        kernel BUG at mm/mlock.c:528!
        invalid opcode: 0000 [#1] SMP
        Modules linked in: ccm arc4 iwldvm [...]
         video
        CPU: 3 PID: 2266 Comm: netsniff-ng Not tainted 3.14.0-rc2+ #8
        Hardware name: LENOVO 2429BP3/2429BP3, BIOS G4ET37WW (1.12 ) 05/29/2012
        task: ffff8801f87f9820 ti: ffff88002cb44000 task.ti: ffff88002cb44000
        RIP: 0010:[<ffffffff81171ad0>]  [<ffffffff81171ad0>] munlock_vma_pages_range+0x2e0/0x2f0
        Call Trace:
          do_munmap+0x18f/0x3b0
          vm_munmap+0x41/0x60
          SyS_munmap+0x22/0x30
          system_call_fastpath+0x1a/0x1f
        RIP   munlock_vma_pages_range+0x2e0/0x2f0
        ---[ end trace a0088dcf07ae10f2 ]---
      
      because munlock_vma_pages_range() thinks it's unexpectedly in the middle
      of a THP page.  This can be reproduced with default config since 3.11
      kernels.  A reproducer can be found in the kernel's selftest directory
      for networking by running ./psock_tpacket.
      
      The problem is that an order=2 compound page (allocated by
      alloc_one_pg_vec_page() is part of the munlocked VM_MIXEDMAP vma (mapped
      by packet_mmap()) and mistaken for a THP page and assumed to be order=9.
      
      The checks for THP in munlock came with commit ff6a6da6 ("mm:
      accelerate munlock() treatment of THP pages"), i.e.  since 3.9, but did
      not trigger a bug.  It just makes munlock_vma_pages_range() skip such
      compound pages until the next 512-pages-aligned page, when it encounters
      a head page.  This is however not a problem for vma's where mlocking has
      no effect anyway, but it can distort the accounting.
      
      Since commit 7225522b ("mm: munlock: batch non-THP page isolation
      and munlock+putback using pagevec") this can trigger a VM_BUG_ON in
      PageTransHuge() check.
      
      This patch fixes the issue by adding VM_MIXEDMAP flag to VM_SPECIAL, a
      list of flags that make vma's non-mlockable and non-mergeable.  The
      reasoning is that VM_MIXEDMAP vma's are similar to VM_PFNMAP, which is
      already on the VM_SPECIAL list, and both are intended for non-LRU pages
      where mlocking makes no sense anyway.  Related Lkml discussion can be
      found in [2].
      
       [1] tools/testing/selftests/net/psock_tpacket
       [2] https://lkml.org/lkml/2014/1/10/427Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Reported-by: NDaniel Borkmann <dborkman@redhat.com>
      Tested-by: NDaniel Borkmann <dborkman@redhat.com>
      Cc: Thomas Hellstrom <thellstrom@vmware.com>
      Cc: John David Anglin <dave.anglin@bell.net>
      Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
      Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
      Cc: Carsten Otte <cotte@de.ibm.com>
      Cc: Jared Hulbert <jaredeh@gmail.com>
      Tested-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: NRik van Riel <riel@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: <stable@vger.kernel.org> [3.11.x+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9050d7eb
    • D
      mm: close PageTail race · 668f9abb
      David Rientjes 提交于
      Commit bf6bddf1 ("mm: introduce compaction and migration for
      ballooned pages") introduces page_count(page) into memory compaction
      which dereferences page->first_page if PageTail(page).
      
      This results in a very rare NULL pointer dereference on the
      aforementioned page_count(page).  Indeed, anything that does
      compound_head(), including page_count() is susceptible to racing with
      prep_compound_page() and seeing a NULL or dangling page->first_page
      pointer.
      
      This patch uses Andrea's implementation of compound_trans_head() that
      deals with such a race and makes it the default compound_head()
      implementation.  This includes a read memory barrier that ensures that
      if PageTail(head) is true that we return a head page that is neither
      NULL nor dangling.  The patch then adds a store memory barrier to
      prep_compound_page() to ensure page->first_page is set.
      
      This is the safest way to ensure we see the head page that we are
      expecting, PageTail(page) is already in the unlikely() path and the
      memory barriers are unfortunately required.
      
      Hugetlbfs is the exception, we don't enforce a store memory barrier
      during init since no race is possible.
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Cc: Holger Kiehl <Holger.Kiehl@dwd.de>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Rafael Aquini <aquini@redhat.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      668f9abb
    • S
      tracing: Do not add event files for modules that fail tracepoints · 45ab2813
      Steven Rostedt (Red Hat) 提交于
      If a module fails to add its tracepoints due to module tainting, do not
      create the module event infrastructure in the debugfs directory. As the events
      will not work and worse yet, they will silently fail, making the user wonder
      why the events they enable do not display anything.
      
      Having a warning on module load and the events not visible to the users
      will make the cause of the problem much clearer.
      
      Link: http://lkml.kernel.org/r/20140227154923.265882695@goodmis.org
      
      Fixes: 6d723736 "tracing/events: add support for modules to TRACE_EVENT"
      Acked-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: stable@vger.kernel.org # 2.6.31+
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      45ab2813