1. 01 10月, 2014 2 次提交
    • L
      of/pci: Fix the conversion of IO ranges into IO resources · 0b0b0893
      Liviu Dudau 提交于
      The ranges property for a host bridge controller in DT describes the
      mapping between the PCI bus address and the CPU physical address.  The
      resources framework however expects that the IO resources start at a pseudo
      "port" address 0 (zero) and have a maximum size of IO_SPACE_LIMIT.  The
      conversion from PCI ranges to resources failed to take that into account,
      returning a CPU physical address instead of a port number.
      
      Also fix all the drivers that depend on the old behaviour by fetching the
      CPU physical address based on the port number where it is being needed.
      Signed-off-by: NLiviu Dudau <Liviu.Dudau@arm.com>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Acked-by: NLinus Walleij <linus.walleij@linaro.org>
      CC: Grant Likely <grant.likely@linaro.org>
      CC: Rob Herring <robh+dt@kernel.org>
      CC: Arnd Bergmann <arnd@arndb.de>
      CC: Thierry Reding <thierry.reding@gmail.com>
      CC: Simon Horman <horms@verge.net.au>
      CC: Catalin Marinas <catalin.marinas@arm.com>
      0b0b0893
    • L
      of/pci: Move of_pci_range_to_resource() to of/address.c · 83bbde1c
      Liviu Dudau 提交于
      We need to enhance of_pci_range_to_resources() enough that it won't make
      sense for it to be inline anymore.  Move it to drivers/of/address.c, under
      #ifdef CONFIG_PCI.
      
      of_address.h previously implemented of_pci_range_to_resources()
      unconditionally, regardless of any config options.  The implementation in
      address.c is defined only when CONFIG_OF_ADDRESS=y and CONFIG_PCI=y,
      so add a dummy version to avoid build errors when CONFIG_OF or
      CONFIG_OF_ADDRESS is not defined.
      
      [bhelgaas: drop extra detail from changelog, move def under CONFIG_PCI,
      add dummy of_pci_range_to_resource() for build errors (from Arnd)]
      Signed-off-by: NLiviu Dudau <Liviu.Dudau@arm.com>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      CC: Grant Likely <grant.likely@linaro.org>
      CC: Rob Herring <robh+dt@kernel.org>
      CC: Arnd Bergmann <arnd@arndb.de>
      CC: Catalin Marinas <catalin.marinas@arm.com>
      83bbde1c
  2. 30 9月, 2014 2 次提交
  3. 23 8月, 2014 2 次提交
    • W
      nfs: don't sleep with inode lock in lock_and_join_requests · 7c3af975
      Weston Andros Adamson 提交于
      This handles the 'nonblock=false' case in nfs_lock_and_join_requests.
      If the group is already locked and blocking is allowed, drop the inode lock
      and wait for the group lock to be cleared before trying it all again.
      This should fix warnings found in peterz's tree (sched/wait branch), where
      might_sleep() checks are added to wait.[ch].
      Reported-by: NFengguang Wu <fengguang.wu@intel.com>
      Signed-off-by: NWeston Andros Adamson <dros@primarydata.com>
      Reviewed-by: NPeng Tao <tao.peng@primarydata.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      7c3af975
    • S
      ftrace: Allow ftrace_ops to use the hashes from other ops · 33b7f99c
      Steven Rostedt (Red Hat) 提交于
      Currently the top level debug file system function tracer shares its
      ftrace_ops with the function graph tracer. This was thought to be fine
      because the tracers are not used together, as one can only enable
      function or function_graph tracer in the current_tracer file.
      
      But that assumption proved to be incorrect. The function profiler
      can use the function graph tracer when function tracing is enabled.
      Since all function graph users uses the function tracing ftrace_ops
      this causes a conflict and when a user enables both function profiling
      as well as the function tracer it will crash ftrace and disable it.
      
      The quick solution so far is to move them as separate ftrace_ops like
      it was earlier. The problem though is to synchronize the functions that
      are traced because both function and function_graph tracer are limited
      by the selections made in the set_ftrace_filter and set_ftrace_notrace
      files.
      
      To handle this, a new structure is made called ftrace_ops_hash. This
      structure will now hold the filter_hash and notrace_hash, and the
      ftrace_ops will point to this structure. That will allow two ftrace_ops
      to share the same hashes.
      
      Since most ftrace_ops do not share the hashes, and to keep allocation
      simple, the ftrace_ops structure will include both a pointer to the
      ftrace_ops_hash called func_hash, as well as the structure itself,
      called local_hash. When the ops are registered, the func_hash pointer
      will be initialized to point to the local_hash within the ftrace_ops
      structure. Some of the ftrace internal ftrace_ops will be initialized
      statically. This will allow for the function and function_graph tracer
      to have separate ops but still share the same hash tables that determine
      what functions they trace.
      
      Cc: stable@vger.kernel.org # 3.16 (apply after 3.17-rc4 is out)
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      33b7f99c
  4. 22 8月, 2014 3 次提交
  5. 21 8月, 2014 1 次提交
  6. 19 8月, 2014 3 次提交
  7. 17 8月, 2014 1 次提交
  8. 15 8月, 2014 6 次提交
    • T
      rhashtable: fix annotations for rht_for_each_entry_rcu() · 93f56081
      Thomas Graf 提交于
      Call rcu_deference_raw() directly from within rht_for_each_entry_rcu()
      as list_for_each_entry_rcu() does.
      
      Fixes the following sparse warnings:
      net/netlink/af_netlink.c:2906:25:    expected struct rhash_head const *__mptr
      net/netlink/af_netlink.c:2906:25:    got struct rhash_head [noderef] <asn:4>*<noident>
      
      Fixes: e341694e ("netlink: Convert netlink_lookup() to use RCU protected hash table")
      Signed-off-by: NThomas Graf <tgraf@suug.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      93f56081
    • T
      rhashtable: unexport and make rht_obj() static · c91eee56
      Thomas Graf 提交于
      No need to export rht_obj(), all inner to outer object translations
      occur internally. It was intended to be used with rht_for_each() which
      now primarily serves as the iterator for rhashtable_remove_pprev() to
      effectively flush and free the full table.
      Signed-off-by: NThomas Graf <tgraf@suug.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c91eee56
    • T
      rhashtable: RCU annotations for next pointers · 5300fdcb
      Thomas Graf 提交于
      Properly annotate next pointers as access is RCU protected in
      the lookup path.
      Signed-off-by: NThomas Graf <tgraf@suug.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5300fdcb
    • H
      tcp: don't allow syn packets without timestamps to pass tcp_tw_recycle logic · a26552af
      Hannes Frederic Sowa 提交于
      tcp_tw_recycle heavily relies on tcp timestamps to build a per-host
      ordering of incoming connections and teardowns without the need to
      hold state on a specific quadruple for TCP_TIMEWAIT_LEN, but only for
      the last measured RTO. To do so, we keep the last seen timestamp in a
      per-host indexed data structure and verify if the incoming timestamp
      in a connection request is strictly greater than the saved one during
      last connection teardown. Thus we can verify later on that no old data
      packets will be accepted by the new connection.
      
      During moving a socket to time-wait state we already verify if timestamps
      where seen on a connection. Only if that was the case we let the
      time-wait socket expire after the RTO, otherwise normal TCP_TIMEWAIT_LEN
      will be used. But we don't verify this on incoming SYN packets. If a
      connection teardown was less than TCP_PAWS_MSL seconds in the past we
      cannot guarantee to not accept data packets from an old connection if
      no timestamps are present. We should drop this SYN packet. This patch
      closes this loophole.
      
      Please note, this patch does not make tcp_tw_recycle in any way more
      usable but only adds another safety check:
      Sporadic drops of SYN packets because of reordering in the network or
      in the socket backlog queues can happen. Users behing NAT trying to
      connect to a tcp_tw_recycle enabled server can get caught in blackholes
      and their connection requests may regullary get dropped because hosts
      behind an address translator don't have synchronized tcp timestamp clocks.
      tcp_tw_recycle cannot work if peers don't have tcp timestamps enabled.
      
      In general, use of tcp_tw_recycle is disadvised.
      
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Florian Westphal <fw@strlen.de>
      Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a26552af
    • N
      tcp: fix tcp_release_cb() to dispatch via address family for mtu_reduced() · 4fab9071
      Neal Cardwell 提交于
      Make sure we use the correct address-family-specific function for
      handling MTU reductions from within tcp_release_cb().
      
      Previously AF_INET6 sockets were incorrectly always using the IPv6
      code path when sometimes they were handling IPv4 traffic and thus had
      an IPv4 dst.
      Signed-off-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Diagnosed-by: NWillem de Bruijn <willemb@google.com>
      Fixes: 563d34d0 ("tcp: dont drop MTU reduction indications")
      Reviewed-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4fab9071
    • A
      tcp: don't use timestamp from repaired skb-s to calculate RTT (v2) · 9d186cac
      Andrey Vagin 提交于
      We don't know right timestamp for repaired skb-s. Wrong RTT estimations
      isn't good, because some congestion modules heavily depends on it.
      
      This patch adds the TCPCB_REPAIRED flag, which is included in
      TCPCB_RETRANS.
      
      Thanks to Eric for the advice how to fix this issue.
      
      This patch fixes the warning:
      [  879.562947] WARNING: CPU: 0 PID: 2825 at net/ipv4/tcp_input.c:3078 tcp_ack+0x11f5/0x1380()
      [  879.567253] CPU: 0 PID: 2825 Comm: socket-tcpbuf-l Not tainted 3.16.0-next-20140811 #1
      [  879.567829] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
      [  879.568177]  0000000000000000 00000000c532680c ffff880039643d00 ffffffff817aa2d2
      [  879.568776]  0000000000000000 ffff880039643d38 ffffffff8109afbd ffff880039d6ba80
      [  879.569386]  ffff88003a449800 000000002983d6bd 0000000000000000 000000002983d6bc
      [  879.569982] Call Trace:
      [  879.570264]  [<ffffffff817aa2d2>] dump_stack+0x4d/0x66
      [  879.570599]  [<ffffffff8109afbd>] warn_slowpath_common+0x7d/0xa0
      [  879.570935]  [<ffffffff8109b0ea>] warn_slowpath_null+0x1a/0x20
      [  879.571292]  [<ffffffff816d0a05>] tcp_ack+0x11f5/0x1380
      [  879.571614]  [<ffffffff816d10bd>] tcp_rcv_established+0x1ed/0x710
      [  879.571958]  [<ffffffff816dc9da>] tcp_v4_do_rcv+0x10a/0x370
      [  879.572315]  [<ffffffff81657459>] release_sock+0x89/0x1d0
      [  879.572642]  [<ffffffff816c81a0>] do_tcp_setsockopt.isra.36+0x120/0x860
      [  879.573000]  [<ffffffff8110a52e>] ? rcu_read_lock_held+0x6e/0x80
      [  879.573352]  [<ffffffff816c8912>] tcp_setsockopt+0x32/0x40
      [  879.573678]  [<ffffffff81654ac4>] sock_common_setsockopt+0x14/0x20
      [  879.574031]  [<ffffffff816537b0>] SyS_setsockopt+0x80/0xf0
      [  879.574393]  [<ffffffff817b40a9>] system_call_fastpath+0x16/0x1b
      [  879.574730] ---[ end trace a17cbc38eb8c5c00 ]---
      
      v2: moving setting of skb->when for repaired skb-s in tcp_write_xmit,
          where it's set for other skb-s.
      
      Fixes: 431a9124 ("tcp: timestamp SYN+DATA messages")
      Fixes: 740b0f18 ("tcp: switch rtt estimations to usec resolution")
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Pavel Emelyanov <xemul@parallels.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Signed-off-by: NAndrey Vagin <avagin@openvz.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9d186cac
  9. 13 8月, 2014 3 次提交
  10. 12 8月, 2014 1 次提交
    • V
      net: Always untag vlan-tagged traffic on input. · 0d5501c1
      Vlad Yasevich 提交于
      Currently the functionality to untag traffic on input resides
      as part of the vlan module and is build only when VLAN support
      is enabled in the kernel.  When VLAN is disabled, the function
      vlan_untag() turns into a stub and doesn't really untag the
      packets.  This seems to create an interesting interaction
      between VMs supporting checksum offloading and some network drivers.
      
      There are some drivers that do not allow the user to change
      tx-vlan-offload feature of the driver.  These drivers also seem
      to assume that any VLAN-tagged traffic they transmit will
      have the vlan information in the vlan_tci and not in the vlan
      header already in the skb.  When transmitting skbs that already
      have tagged data with partial checksum set, the checksum doesn't
      appear to be updated correctly by the card thus resulting in a
      failure to establish TCP connections.
      
      The following is a packet trace taken on the receiver where a
      sender is a VM with a VLAN configued.  The host VM is running on
      doest not have VLAN support and the outging interface on the
      host is tg3:
      10:12:43.503055 52:54:00:ae:42:3f > 28:d2:44:7d:c2:de, ethertype 802.1Q
      (0x8100), length 78: vlan 100, p 0, ethertype IPv4, (tos 0x0, ttl 64, id 27243,
      offset 0, flags [DF], proto TCP (6), length 60)
          10.0.100.1.58545 > 10.0.100.10.ircu-2: Flags [S], cksum 0xdc39 (incorrect
      -> 0x48d9), seq 1069378582, win 29200, options [mss 1460,sackOK,TS val
      4294837885 ecr 0,nop,wscale 7], length 0
      10:12:44.505556 52:54:00:ae:42:3f > 28:d2:44:7d:c2:de, ethertype 802.1Q
      (0x8100), length 78: vlan 100, p 0, ethertype IPv4, (tos 0x0, ttl 64, id 27244,
      offset 0, flags [DF], proto TCP (6), length 60)
          10.0.100.1.58545 > 10.0.100.10.ircu-2: Flags [S], cksum 0xdc39 (incorrect
      -> 0x44ee), seq 1069378582, win 29200, options [mss 1460,sackOK,TS val
      4294838888 ecr 0,nop,wscale 7], length 0
      
      This connection finally times out.
      
      I've only access to the TG3 hardware in this configuration thus have
      only tested this with TG3 driver.  There are a lot of other drivers
      that do not permit user changes to vlan acceleration features, and
      I don't know if they all suffere from a similar issue.
      
      The patch attempt to fix this another way.  It moves the vlan header
      stipping code out of the vlan module and always builds it into the
      kernel network core.  This way, even if vlan is not supported on
      a virtualizatoin host, the virtual machines running on top of such
      host will still work with VLANs enabled.
      
      CC: Patrick McHardy <kaber@trash.net>
      CC: Nithin Nayak Sujir <nsujir@broadcom.com>
      CC: Michael Chan <mchan@broadcom.com>
      CC: Jiri Pirko <jiri@resnulli.us>
      Signed-off-by: NVladislav Yasevich <vyasevic@redhat.com>
      Acked-by: NJiri Pirko <jiri@resnulli.us>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0d5501c1
  11. 11 8月, 2014 3 次提交
  12. 10 8月, 2014 2 次提交
  13. 09 8月, 2014 11 次提交
    • A
      drm/ttm: expose CPU address of DMA-allocated pages · 3d50d4dc
      Alexandre Courbot 提交于
      Pages allocated using the DMA API have a coherent memory mapping. Make
      this mapping visible to drivers so they can decide to use it instead of
      creating their own redundant one.
      Signed-off-by: NAlexandre Courbot <acourbot@nvidia.com>
      Acked-by: NDavid Airlie <airlied@linux.ie>
      Signed-off-by: NBen Skeggs <bskeggs@redhat.com>
      3d50d4dc
    • V
      kexec: verify the signature of signed PE bzImage · 8e7d8381
      Vivek Goyal 提交于
      This is the final piece of the puzzle of verifying kernel image signature
      during kexec_file_load() syscall.
      
      This patch calls into PE file routines to verify signature of bzImage.  If
      signature are valid, kexec_file_load() succeeds otherwise it fails.
      
      Two new config options have been introduced.  First one is
      CONFIG_KEXEC_VERIFY_SIG.  This option enforces that kernel has to be
      validly signed otherwise kernel load will fail.  If this option is not
      set, no signature verification will be done.  Only exception will be when
      secureboot is enabled.  In that case signature verification should be
      automatically enforced when secureboot is enabled.  But that will happen
      when secureboot patches are merged.
      
      Second config option is CONFIG_KEXEC_BZIMAGE_VERIFY_SIG.  This option
      enables signature verification support on bzImage.  If this option is not
      set and previous one is set, kernel image loading will fail because kernel
      does not have support to verify signature of bzImage.
      
      I tested these patches with both "pesign" and "sbsign" signed bzImages.
      
      I used signing_key.priv key and signing_key.x509 cert for signing as
      generated during kernel build process (if module signing is enabled).
      
      Used following method to sign bzImage.
      
      pesign
      ======
      - Convert DER format cert to PEM format cert
      openssl x509 -in signing_key.x509 -inform DER -out signing_key.x509.PEM -outform
      PEM
      
      - Generate a .p12 file from existing cert and private key file
      openssl pkcs12 -export -out kernel-key.p12 -inkey signing_key.priv -in
      signing_key.x509.PEM
      
      - Import .p12 file into pesign db
      pk12util -i /tmp/kernel-key.p12 -d /etc/pki/pesign
      
      - Sign bzImage
      pesign -i /boot/vmlinuz-3.16.0-rc3+ -o /boot/vmlinuz-3.16.0-rc3+.signed.pesign
      -c "Glacier signing key - Magrathea" -s
      
      sbsign
      ======
      sbsign --key signing_key.priv --cert signing_key.x509.PEM --output
      /boot/vmlinuz-3.16.0-rc3+.signed.sbsign /boot/vmlinuz-3.16.0-rc3+
      
      Patch details:
      
      Well all the hard work is done in previous patches.  Now bzImage loader
      has just call into that code and verify whether bzImage signature are
      valid or not.
      
      Also create two config options.  First one is CONFIG_KEXEC_VERIFY_SIG.
      This option enforces that kernel has to be validly signed otherwise kernel
      load will fail.  If this option is not set, no signature verification will
      be done.  Only exception will be when secureboot is enabled.  In that case
      signature verification should be automatically enforced when secureboot is
      enabled.  But that will happen when secureboot patches are merged.
      
      Second config option is CONFIG_KEXEC_BZIMAGE_VERIFY_SIG.  This option
      enables signature verification support on bzImage.  If this option is not
      set and previous one is set, kernel image loading will fail because kernel
      does not have support to verify signature of bzImage.
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Matthew Garrett <mjg59@srcf.ucam.org>
      Cc: Greg Kroah-Hartman <greg@kroah.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: WANG Chao <chaowang@redhat.com>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Matt Fleming <matt@console-pimps.org>
      Cc: David Howells <dhowells@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8e7d8381
    • V
      kexec: support kexec/kdump on EFI systems · 6a2c20e7
      Vivek Goyal 提交于
      This patch does two things.  It passes EFI run time mappings to second
      kernel in bootparams efi_info.  Second kernel parse this info and create
      new mappings in second kernel.  That means mappings in first and second
      kernel will be same.  This paves the way to enable EFI in kexec kernel.
      
      This patch also prepares and passes EFI setup data through bootparams.
      This contains bunch of information about various tables and their
      addresses.
      
      These information gathering and passing has been written along the lines
      of what current kexec-tools is doing to make kexec work with UEFI.
      
      [akpm@linux-foundation.org: s/get_efi/efi_get/g, per Matt]
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Matthew Garrett <mjg59@srcf.ucam.org>
      Cc: Greg Kroah-Hartman <greg@kroah.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: WANG Chao <chaowang@redhat.com>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Matt Fleming <matt@console-pimps.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6a2c20e7
    • V
      kexec-bzImage64: support for loading bzImage using 64bit entry · 27f48d3e
      Vivek Goyal 提交于
      This is loader specific code which can load bzImage and set it up for
      64bit entry.  This does not take care of 32bit entry or real mode entry.
      
      32bit mode entry can be implemented if somebody needs it.
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Matthew Garrett <mjg59@srcf.ucam.org>
      Cc: Greg Kroah-Hartman <greg@kroah.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: WANG Chao <chaowang@redhat.com>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      27f48d3e
    • V
      kexec: load and relocate purgatory at kernel load time · 12db5562
      Vivek Goyal 提交于
      Load purgatory code in RAM and relocate it based on the location.
      Relocation code has been inspired by module relocation code and purgatory
      relocation code in kexec-tools.
      
      Also compute the checksums of loaded kexec segments and store them in
      purgatory.
      
      Arch independent code provides this functionality so that arch dependent
      bootloaders can make use of it.
      
      Helper functions are provided to get/set symbol values in purgatory which
      are used by bootloaders later to set things like stack and entry point of
      second kernel etc.
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Matthew Garrett <mjg59@srcf.ucam.org>
      Cc: Greg Kroah-Hartman <greg@kroah.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: WANG Chao <chaowang@redhat.com>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      12db5562
    • V
      kexec: implementation of new syscall kexec_file_load · cb105258
      Vivek Goyal 提交于
      Previous patch provided the interface definition and this patch prvides
      implementation of new syscall.
      
      Previously segment list was prepared in user space.  Now user space just
      passes kernel fd, initrd fd and command line and kernel will create a
      segment list internally.
      
      This patch contains generic part of the code.  Actual segment preparation
      and loading is done by arch and image specific loader.  Which comes in
      next patch.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Matthew Garrett <mjg59@srcf.ucam.org>
      Cc: Greg Kroah-Hartman <greg@kroah.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: WANG Chao <chaowang@redhat.com>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cb105258
    • V
      kexec: new syscall kexec_file_load() declaration · f0895685
      Vivek Goyal 提交于
      This is the new syscall kexec_file_load() declaration/interface.  I have
      reserved the syscall number only for x86_64 so far.  Other architectures
      (including i386) can reserve syscall number when they enable the support
      for this new syscall.
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Matthew Garrett <mjg59@srcf.ucam.org>
      Cc: Greg Kroah-Hartman <greg@kroah.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: WANG Chao <chaowang@redhat.com>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f0895685
    • V
      kexec: make kexec_segment user buffer pointer a union · 815d5704
      Vivek Goyal 提交于
      So far kexec_segment->buf was always a user space pointer as user space
      passed the array of kexec_segment structures and kernel copied it.
      
      But with new system call, list of kexec segments will be prepared by
      kernel and kexec_segment->buf will point to a kernel memory.
      
      So while I was adding code where I made assumption that ->buf is pointing
      to kernel memory, sparse started giving warning.
      
      Make ->buf a union.  And where a user space pointer is expected, access it
      using ->buf and where a kernel space pointer is expected, access it using
      ->kbuf.  That takes care of sparse warnings.
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Matthew Garrett <mjg59@srcf.ucam.org>
      Cc: Greg Kroah-Hartman <greg@kroah.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: WANG Chao <chaowang@redhat.com>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      815d5704
    • V
      resource: provide new functions to walk through resources · 8c86e70a
      Vivek Goyal 提交于
      I have added two more functions to walk through resources.
      
      Currently walk_system_ram_range() deals with pfn and /proc/iomem can
      contain partial pages.  By dealing in pfn, callback function loses the
      info that last page of a memory range is a partial page and not the full
      page.  So I implemented walk_system_ram_res() which returns u64 values to
      callback functions and now it properly return start and end address.
      
      walk_system_ram_range() uses find_next_system_ram() to find the next ram
      resource.  This in turn only travels through siblings of top level child
      and does not travers through all the nodes of the resoruce tree.  I also
      need another function where I can walk through all the resources, for
      example figure out where "GART" aperture is.  Figure out where ACPI memory
      is.
      
      So I wrote another function walk_iomem_res() which walks through all
      /proc/iomem resources and returns matches as asked by caller.  Caller can
      specify "name" of resource, start and end and flags.
      
      Got rid of find_next_system_ram_res() and instead implemented more generic
      find_next_iomem_res() which can be used to traverse top level children
      only based on an argument.
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Matthew Garrett <mjg59@srcf.ucam.org>
      Cc: Greg Kroah-Hartman <greg@kroah.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: WANG Chao <chaowang@redhat.com>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8c86e70a
    • V
      kexec: rename unusebale_pages to unusable_pages · 7d3e2bca
      Vivek Goyal 提交于
      Let's use the more common "unusable".
      
      This patch was originally written and posted by Boris. I am including it
      in this patch series.
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Matthew Garrett <mjg59@srcf.ucam.org>
      Cc: Greg Kroah-Hartman <greg@kroah.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: WANG Chao <chaowang@redhat.com>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7d3e2bca
    • D
      shm: add memfd_create() syscall · 9183df25
      David Herrmann 提交于
      memfd_create() is similar to mmap(MAP_ANON), but returns a file-descriptor
      that you can pass to mmap().  It can support sealing and avoids any
      connection to user-visible mount-points.  Thus, it's not subject to quotas
      on mounted file-systems, but can be used like malloc()'ed memory, but with
      a file-descriptor to it.
      
      memfd_create() returns the raw shmem file, so calls like ftruncate() can
      be used to modify the underlying inode.  Also calls like fstat() will
      return proper information and mark the file as regular file.  If you want
      sealing, you can specify MFD_ALLOW_SEALING.  Otherwise, sealing is not
      supported (like on all other regular files).
      
      Compared to O_TMPFILE, it does not require a tmpfs mount-point and is not
      subject to a filesystem size limit.  It is still properly accounted to
      memcg limits, though, and to the same overcommit or no-overcommit
      accounting as all user memory.
      Signed-off-by: NDavid Herrmann <dh.herrmann@gmail.com>
      Acked-by: NHugh Dickins <hughd@google.com>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Ryan Lortie <desrt@desrt.ca>
      Cc: Lennart Poettering <lennart@poettering.net>
      Cc: Daniel Mack <zonque@gmail.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9183df25