1. 05 1月, 2016 1 次提交
  2. 31 12月, 2015 1 次提交
  3. 30 12月, 2015 1 次提交
    • H
      mm/vmstat: fix overflow in mod_zone_page_state() · 6cdb18ad
      Heiko Carstens 提交于
      mod_zone_page_state() takes a "delta" integer argument.  delta contains
      the number of pages that should be added or subtracted from a struct
      zone's vm_stat field.
      
      If a zone is larger than 8TB this will cause overflows.  E.g.  for a
      zone with a size slightly larger than 8TB the line
      
          mod_zone_page_state(zone, NR_ALLOC_BATCH, zone->managed_pages);
      
      in mm/page_alloc.c:free_area_init_core() will result in a negative
      result for the NR_ALLOC_BATCH entry within the zone's vm_stat, since 8TB
      contain 0x8xxxxxxx pages which will be sign extended to a negative
      value.
      
      Fix this by changing the delta argument to long type.
      
      This could fix an early boot problem seen on s390, where we have a 9TB
      system with only one node.  ZONE_DMA contains 2GB and ZONE_NORMAL the
      rest.  The system is trying to allocate a GFP_DMA page but ZONE_DMA is
      completely empty, so it tries to reclaim pages in an endless loop.
      
      This was seen on a heavily patched 3.10 kernel.  One possible
      explaination seem to be the overflows caused by mod_zone_page_state().
      Unfortunately I did not have the chance to verify that this patch
      actually fixes the problem, since I don't have access to the system
      right now.  However the overflow problem does exist anyway.
      
      Given the description that a system with slightly less than 8TB does
      work, this seems to be a candidate for the observed problem.
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Christoph Lameter <cl@linux.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6cdb18ad
  4. 29 12月, 2015 1 次提交
  5. 26 12月, 2015 1 次提交
  6. 24 12月, 2015 1 次提交
    • B
      net: cdc_ncm: avoid changing RX/TX buffers on MTU changes · 1dfddff5
      Bjørn Mork 提交于
      NCM buffer sizes are negotiated with the device independently of
      the network device MTU.  The RX buffers are allocated by the
      usbnet framework based on the rx_urb_size value set by cdc_ncm. A
      single RX buffer can hold a number of MTU sized packets.
      
      The default usbnet change_mtu ndo only modifies rx_urb_size if it
      is equal to hard_mtu.  And the cdc_ncm driver will set rx_urb_size
      and hard_mtu independently of each other, based on dwNtbInMaxSize
      and dwNtbOutMaxSize respectively. It was therefore assumed that
      usbnet_change_mtu() would never touch rx_urb_size.  This failed to
      consider the case where dwNtbInMaxSize and dwNtbOutMaxSize happens
      to be equal.
      
      Fix by implementing an NCM specific change_mtu ndo, modifying the
      netdev MTU without touching the buffer size settings.
      Signed-off-by: NBjørn Mork <bjorn@mork.no>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1dfddff5
  7. 23 12月, 2015 1 次提交
  8. 19 12月, 2015 5 次提交
    • J
      include/linux/mmdebug.h: should include linux/bug.h · 1d5cda40
      James Morse 提交于
      mmdebug.h uses BUILD_BUG_ON_INVALID(), assuming someone else included
      linux/bug.h.  Include it ourselves.
      
      This saves build-failures such as:
      
        arch/arm64/include/asm/pgtable.h: In function 'set_pte_at':
        arch/arm64/include/asm/pgtable.h:281:3: error: implicit declaration of function 'BUILD_BUG_ON_INVALID' [-Werror=implicit-function-declaration]
         VM_WARN_ONCE(!pte_young(pte),
      
      Fixes: 02602a18 ("bug: completely remove code generated by disabled VM_BUG_ON()")
      Signed-off-by: NJames Morse <james.morse@arm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1d5cda40
    • D
      bpf: add bpf_skb_load_bytes helper · 05c74e5e
      Daniel Borkmann 提交于
      When hacking tc programs with eBPF, one of the issues that come up
      from time to time is to load addresses from headers. In eBPF as in
      classic BPF, we have BPF_LD | BPF_ABS | BPF_{B,H,W} instructions that
      extract a byte, half-word or word out of the skb data though helpers
      such as bpf_load_pointer() (interpreter case).
      
      F.e. extracting a whole IPv6 address could possibly look like ...
      
        union v6addr {
          struct {
            __u32 p1;
            __u32 p2;
            __u32 p3;
            __u32 p4;
          };
          __u8 addr[16];
        };
      
        [...]
      
        a.p1 = htonl(load_word(skb, off));
        a.p2 = htonl(load_word(skb, off +  4));
        a.p3 = htonl(load_word(skb, off +  8));
        a.p4 = htonl(load_word(skb, off + 12));
      
        [...]
      
        /* access to a.addr[...] */
      
      This work adds a complementary helper bpf_skb_load_bytes() (we also
      have bpf_skb_store_bytes()) as an alternative where the same call
      would look like from an eBPF program:
      
        ret = bpf_skb_load_bytes(skb, off, addr, sizeof(addr));
      
      Same verifier restrictions apply as in ffeedafb ("bpf: introduce
      current->pid, tgid, uid, gid, comm accessors") case, where stack memory
      access needs to be statically verified and thus guaranteed to be
      initialized in first use (otherwise verifier cannot tell whether a
      subsequent access to it is valid or not as it's runtime dependent).
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      05c74e5e
    • D
      net: Allow accepted sockets to be bound to l3mdev domain · 6dd9a14e
      David Ahern 提交于
      Allow accepted sockets to derive their sk_bound_dev_if setting from the
      l3mdev domain in which the packets originated. A sysctl setting is added
      to control the behavior which is similar to sk_mark and
      sysctl_tcp_fwmark_accept.
      
      This effectively allow a process to have a "VRF-global" listen socket,
      with child sockets bound to the VRF device in which the packet originated.
      A similar behavior can be achieved using sk_mark, but a solution using marks
      is incomplete as it does not handle duplicate addresses in different L3
      domains/VRFs. Allowing sockets to inherit the sk_bound_dev_if from l3mdev
      domain provides a complete solution.
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6dd9a14e
    • D
      net: l3mdev: Add master device lookup by index · 1a852479
      David Ahern 提交于
      Add helper to lookup l3mdev master index given a device index.
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1a852479
    • B
      ipv6: addrconf: use stable address generator for ARPHRD_NONE · cc9da6cc
      Bjørn Mork 提交于
      Add a new address generator mode, using the stable address generator
      with an automatically generated secret. This is intended as a default
      address generator mode for device types with no EUI64 implementation.
      The new generator is used for ARPHRD_NONE interfaces initially, adding
      default IPv6 autoconf support to e.g. tun interfaces.
      
      If the addrgenmode is set to 'random', either by default or manually,
      and no stable secret is available, then a random secret is used as
      input for the stable-privacy address generator.  The secret can be
      read and modified like manually configured secrets, using the proc
      interface.  Modifying the secret will change the addrgen mode to
      'stable-privacy' to indicate that it operates on a known secret.
      
      Existing behaviour of the 'stable-privacy' mode is kept unchanged. If
      a known secret is available when the device is created, then the mode
      will default to 'stable-privacy' as before.  The mode can be manually
      set to 'random' but it will behave exactly like 'stable-privacy' in
      this case. The secret will not change.
      
      Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Cc: 吉藤英明 <hideaki.yoshifuji@miraclelinux.com>
      Signed-off-by: NBjørn Mork <bjorn@mork.no>
      Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cc9da6cc
  9. 18 12月, 2015 3 次提交
  10. 17 12月, 2015 1 次提交
  11. 16 12月, 2015 19 次提交
  12. 15 12月, 2015 5 次提交
    • E
      net: fix IP early demux races · 5037e9ef
      Eric Dumazet 提交于
      David Wilder reported crashes caused by dst reuse.
      
      <quote David>
        I am seeing a crash on a distro V4.2.3 kernel caused by a double
        release of a dst_entry.  In ipv4_dst_destroy() the call to
        list_empty() finds a poisoned next pointer, indicating the dst_entry
        has already been removed from the list and freed. The crash occurs
        18 to 24 hours into a run of a network stress exerciser.
      </quote>
      
      Thanks to his detailed report and analysis, we were able to understand
      the core issue.
      
      IP early demux can associate a dst to skb, after a lookup in TCP/UDP
      sockets.
      
      When socket cache is not properly set, we want to store into
      sk->sk_dst_cache the dst for future IP early demux lookups,
      by acquiring a stable refcount on the dst.
      
      Problem is this acquisition is simply using an atomic_inc(),
      which works well, unless the dst was queued for destruction from
      dst_release() noticing dst refcount went to zero, if DST_NOCACHE
      was set on dst.
      
      We need to make sure current refcount is not zero before incrementing
      it, or risk double free as David reported.
      
      This patch, being a stable candidate, adds two new helpers, and use
      them only from IP early demux problematic paths.
      
      It might be possible to merge in net-next skb_dst_force() and
      skb_dst_force_safe(), but I prefer having the smallest patch for stable
      kernels : Maybe some skb_dst_force() callers do not expect skb->dst
      can suddenly be cleared.
      
      Can probably be backported back to linux-3.6 kernels
      Reported-by: NDavid J. Wilder <dwilder@us.ibm.com>
      Tested-by: NDavid J. Wilder <dwilder@us.ibm.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5037e9ef
    • M
      net: Fix typo in skb_fclone_busy · bda13fed
      Masanari Iida 提交于
      This patch fix a typo found within comment of skb_fclone_busy.
      Signed-off-by: NMasanari Iida <standby24x7@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bda13fed
    • H
      net: add validation for the socket syscall protocol argument · 79462ad0
      Hannes Frederic Sowa 提交于
      郭永刚 reported that one could simply crash the kernel as root by
      using a simple program:
      
      	int socket_fd;
      	struct sockaddr_in addr;
      	addr.sin_port = 0;
      	addr.sin_addr.s_addr = INADDR_ANY;
      	addr.sin_family = 10;
      
      	socket_fd = socket(10,3,0x40000000);
      	connect(socket_fd , &addr,16);
      
      AF_INET, AF_INET6 sockets actually only support 8-bit protocol
      identifiers. inet_sock's skc_protocol field thus is sized accordingly,
      thus larger protocol identifiers simply cut off the higher bits and
      store a zero in the protocol fields.
      
      This could lead to e.g. NULL function pointer because as a result of
      the cut off inet_num is zero and we call down to inet_autobind, which
      is NULL for raw sockets.
      
      kernel: Call Trace:
      kernel:  [<ffffffff816db90e>] ? inet_autobind+0x2e/0x70
      kernel:  [<ffffffff816db9a4>] inet_dgram_connect+0x54/0x80
      kernel:  [<ffffffff81645069>] SYSC_connect+0xd9/0x110
      kernel:  [<ffffffff810ac51b>] ? ptrace_notify+0x5b/0x80
      kernel:  [<ffffffff810236d8>] ? syscall_trace_enter_phase2+0x108/0x200
      kernel:  [<ffffffff81645e0e>] SyS_connect+0xe/0x10
      kernel:  [<ffffffff81779515>] tracesys_phase2+0x84/0x89
      
      I found no particular commit which introduced this problem.
      
      CVE: CVE-2015-8543
      Cc: Cong Wang <cwang@twopensource.com>
      Reported-by: N郭永刚 <guoyonggang@360.cn>
      Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      79462ad0
    • T
      netfilter: implement xt_cgroup cgroup2 path match · c38c4597
      Tejun Heo 提交于
      This patch implements xt_cgroup path match which matches cgroup2
      membership of the associated socket.  The match is recursive and
      invertible.
      
      For rationales on introducing another cgroup based match, please refer
      to a preceding commit "sock, cgroup: add sock->sk_cgroup".
      
      v3: Folded into xt_cgroup as a new revision interface as suggested by
          Pablo.
      
      v2: Included linux/limits.h from xt_cgroup2.h for PATH_MAX.  Added
          explicit alignment to the priv field.  Both suggested by Jan.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Daniel Wagner <daniel.wagner@bmw-carit.de>
      CC: Neil Horman <nhorman@tuxdriver.com>
      Cc: Jan Engelhardt <jengelh@inai.de>
      Cc: Pablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      c38c4597
    • T
      netfilter: prepare xt_cgroup for multi revisions · 4ec8ff0e
      Tejun Heo 提交于
      xt_cgroup will grow cgroup2 path based match.  Postfix existing
      symbols with _v0 and prepare for multi revision registration.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Daniel Wagner <daniel.wagner@bmw-carit.de>
      CC: Neil Horman <nhorman@tuxdriver.com>
      Cc: Jan Engelhardt <jengelh@inai.de>
      Cc: Pablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      4ec8ff0e