1. 18 3月, 2013 2 次提交
  2. 15 3月, 2013 2 次提交
  3. 13 3月, 2013 1 次提交
  4. 12 3月, 2013 4 次提交
    • M
      phy: add set_wol/get_wol functions · 42e836eb
      Michael Stapelberg 提交于
      This allows ethernet drivers (such as the mv643xx_eth) to support
      Wake on LAN on platforms where PHY registers have to be configured
      for Wake on LAN (e.g. the Marvell Kirkwood based qnap TS-119P II).
      Signed-off-by: NMichael Stapelberg <michael@stapelberg.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      42e836eb
    • N
      tcp: TLP loss detection. · 9b717a8d
      Nandita Dukkipati 提交于
      This is the second of the TLP patch series; it augments the basic TLP
      algorithm with a loss detection scheme.
      
      This patch implements a mechanism for loss detection when a Tail
      loss probe retransmission plugs a hole thereby masking packet loss
      from the sender. The loss detection algorithm relies on counting
      TLP dupacks as outlined in Sec. 3 of:
      http://tools.ietf.org/html/draft-dukkipati-tcpm-tcp-loss-probe-01
      
      The basic idea is: Sender keeps track of TLP "episode" upon
      retransmission of a TLP packet. An episode ends when the sender receives
      an ACK above the SND.NXT (tracked by tlp_high_seq) at the time of the
      episode. We want to make sure that before the episode ends the sender
      receives a "TLP dupack", indicating that the TLP retransmission was
      unnecessary, so there was no loss/hole that needed plugging. If the
      sender gets no TLP dupack before the end of the episode, then it reduces
      ssthresh and the congestion window, because the TLP packet arriving at
      the receiver probably plugged a hole.
      Signed-off-by: NNandita Dukkipati <nanditad@google.com>
      Acked-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9b717a8d
    • N
      tcp: Tail loss probe (TLP) · 6ba8a3b1
      Nandita Dukkipati 提交于
      This patch series implement the Tail loss probe (TLP) algorithm described
      in http://tools.ietf.org/html/draft-dukkipati-tcpm-tcp-loss-probe-01. The
      first patch implements the basic algorithm.
      
      TLP's goal is to reduce tail latency of short transactions. It achieves
      this by converting retransmission timeouts (RTOs) occuring due
      to tail losses (losses at end of transactions) into fast recovery.
      TLP transmits one packet in two round-trips when a connection is in
      Open state and isn't receiving any ACKs. The transmitted packet, aka
      loss probe, can be either new or a retransmission. When there is tail
      loss, the ACK from a loss probe triggers FACK/early-retransmit based
      fast recovery, thus avoiding a costly RTO. In the absence of loss,
      there is no change in the connection state.
      
      PTO stands for probe timeout. It is a timer event indicating
      that an ACK is overdue and triggers a loss probe packet. The PTO value
      is set to max(2*SRTT, 10ms) and is adjusted to account for delayed
      ACK timer when there is only one oustanding packet.
      
      TLP Algorithm
      
      On transmission of new data in Open state:
        -> packets_out > 1: schedule PTO in max(2*SRTT, 10ms).
        -> packets_out == 1: schedule PTO in max(2*RTT, 1.5*RTT + 200ms)
        -> PTO = min(PTO, RTO)
      
      Conditions for scheduling PTO:
        -> Connection is in Open state.
        -> Connection is either cwnd limited or no new data to send.
        -> Number of probes per tail loss episode is limited to one.
        -> Connection is SACK enabled.
      
      When PTO fires:
        new_segment_exists:
          -> transmit new segment.
          -> packets_out++. cwnd remains same.
      
        no_new_packet:
          -> retransmit the last segment.
             Its ACK triggers FACK or early retransmit based recovery.
      
      ACK path:
        -> rearm RTO at start of ACK processing.
        -> reschedule PTO if need be.
      
      In addition, the patch includes a small variation to the Early Retransmit
      (ER) algorithm, such that ER and TLP together can in principle recover any
      N-degree of tail loss through fast recovery. TLP is controlled by the same
      sysctl as ER, tcp_early_retrans sysctl.
      tcp_early_retrans==0; disables TLP and ER.
      		 ==1; enables RFC5827 ER.
      		 ==2; delayed ER.
      		 ==3; TLP and delayed ER. [DEFAULT]
      		 ==4; TLP only.
      
      The TLP patch series have been extensively tested on Google Web servers.
      It is most effective for short Web trasactions, where it reduced RTOs by 15%
      and improved HTTP response time (average by 6%, 99th percentile by 10%).
      The transmitted probes account for <0.5% of the overall transmissions.
      Signed-off-by: NNandita Dukkipati <nanditad@google.com>
      Acked-by: NNeal Cardwell <ncardwell@google.com>
      Acked-by: NYuchung Cheng <ycheng@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6ba8a3b1
    • H
      phy/micrel: Add support for KSZ8031 · b818d1a7
      Hector Palacios 提交于
      Micrel PHY KSZ8031 is similar to KSZ8021 and also requires the special
      initialization of "Operation Mode Strap Override" in reg 0x16
      introduced in 212ea99a (phy/micrel: Implement support for KSZ8021).
      Signed-off-by: NHector Palacios <hector.palacios@digi.com>
      Reviewed-by: NMarek Vasut <marex@denx.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b818d1a7
  5. 11 3月, 2013 2 次提交
  6. 10 3月, 2013 4 次提交
  7. 09 3月, 2013 3 次提交
  8. 08 3月, 2013 4 次提交
  9. 07 3月, 2013 3 次提交
  10. 06 3月, 2013 1 次提交
  11. 04 3月, 2013 4 次提交
    • R
      ACPI / glue: Drop .find_bridge() callback from struct acpi_bus_type · 92414481
      Rafael J. Wysocki 提交于
      After PCI and USB have stopped using the .find_bridge() callback in
      struct acpi_bus_type, the only remaining user of it is SATA, but SATA
      only pretends to be a user, because it points that callback to a stub
      always returning -ENODEV.
      
      For this reason, drop the SATA's dummy .find_bridge() callback and
      remove .find_bridge(), which is not used any more, from struct
      acpi_bus_type entirely.
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: NYinghai Lu <yinghai@kernel.org>
      Acked-by: NJeff Garzik <jgarzik@pobox.com>
      92414481
    • R
      ACPI / glue: Add .match() callback to struct acpi_bus_type · 53540098
      Rafael J. Wysocki 提交于
      USB uses the .find_bridge() callback from struct acpi_bus_type
      incorrectly, because as a result of the way it is used by USB every
      device in the system that doesn't have a bus type or parent is
      passed to usb_acpi_find_device() for inspection.
      
      What USB actually needs, though, is to call usb_acpi_find_device()
      for USB ports that don't have a bus type defined, but have
      usb_port_device_type as their device type, as well as for USB
      devices.
      
      To fix that replace the struct bus_type pointer in struct
      acpi_bus_type used for matching devices to specific subsystems
      with a .match() callback to be used for this purpose and update
      the users of struct acpi_bus_type, including USB, accordingly.
      Define the .match() callback routine for USB, usb_acpi_bus_match(),
      in such a way that it will cover both USB devices and USB ports
      and remove the now redundant .find_bridge() callback pointer from
      usb_acpi_bus.
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Acked-by: NYinghai Lu <yinghai@kernel.org>
      Acked-by: NJeff Garzik <jgarzik@pobox.com>
      53540098
    • K
      eCryptfs: allow userspace messaging to be disabled · 290502be
      Kees Cook 提交于
      When the userspace messaging (for the less common case of userspace key
      wrap/unwrap via ecryptfsd) is not needed, allow eCryptfs to build with
      it removed. This saves on kernel code size and reduces potential attack
      surface by removing the /dev/ecryptfs node.
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Signed-off-by: NTyler Hicks <tyhicks@canonical.com>
      290502be
    • E
      fs: Limit sys_mount to only request filesystem modules. · 7f78e035
      Eric W. Biederman 提交于
      Modify the request_module to prefix the file system type with "fs-"
      and add aliases to all of the filesystems that can be built as modules
      to match.
      
      A common practice is to build all of the kernel code and leave code
      that is not commonly needed as modules, with the result that many
      users are exposed to any bug anywhere in the kernel.
      
      Looking for filesystems with a fs- prefix limits the pool of possible
      modules that can be loaded by mount to just filesystems trivially
      making things safer with no real cost.
      
      Using aliases means user space can control the policy of which
      filesystem modules are auto-loaded by editing /etc/modprobe.d/*.conf
      with blacklist and alias directives.  Allowing simple, safe,
      well understood work-arounds to known problematic software.
      
      This also addresses a rare but unfortunate problem where the filesystem
      name is not the same as it's module name and module auto-loading
      would not work.  While writing this patch I saw a handful of such
      cases.  The most significant being autofs that lives in the module
      autofs4.
      
      This is relevant to user namespaces because we can reach the request
      module in get_fs_type() without having any special permissions, and
      people get uncomfortable when a user specified string (in this case
      the filesystem type) goes all of the way to request_module.
      
      After having looked at this issue I don't think there is any
      particular reason to perform any filtering or permission checks beyond
      making it clear in the module request that we want a filesystem
      module.  The common pattern in the kernel is to call request_module()
      without regards to the users permissions.  In general all a filesystem
      module does once loaded is call register_filesystem() and go to sleep.
      Which means there is not much attack surface exposed by loading a
      filesytem module unless the filesystem is mounted.  In a user
      namespace filesystems are not mounted unless .fs_flags = FS_USERNS_MOUNT,
      which most filesystems do not set today.
      Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
      Acked-by: NKees Cook <keescook@chromium.org>
      Reported-by: NKees Cook <keescook@google.com>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      7f78e035
  12. 03 3月, 2013 7 次提交
    • J
      mm: define VM_GROWSUP for CONFIG_METAG · 9ca52ed9
      James Hogan 提交于
      Commit cc2383ec ("mm: introduce
      arch-specific vma flag VM_ARCH_1") merged in v3.7-rc1.
      
      The above commit combined several arch-specific vma flags into one, and
      in the process it changed the VM_GROWSUP definition to depend on
      specific architectures rather than CONFIG_STACK_GROWSUP. Therefore add
      an ifdef for CONFIG_METAG to also set VM_GROWSUP.
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: linux-mm@kvack.org
      9ca52ed9
    • J
      metag: Internal and external irqchips · 5698c50d
      James Hogan 提交于
      Meta core internal interrupts (from HWSTATMETA and friends) are vectored
      onto the TR1 core trigger for the current thread. This is demultiplexed
      in irq-metag.c to individual Linux IRQs for each internal interrupt.
      
      External SoC interrupts (from HWSTATEXT and friends) are vectored onto
      the TR2 core trigger for the current thread. This is demultiplexed in
      irq-metag-ext.c to individual Linux IRQs for each external SoC interrupt.
      The external irqchip has devicetree bindings for configuring the number
      of irq banks and the type of masking available.
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Grant Likely <grant.likely@secretlab.ca>
      Cc: Rob Herring <rob.herring@calxeda.com>
      Cc: Rob Landley <rob@landley.net>
      Cc: Dom Cobley <popcornmix@gmail.com>
      Cc: Simon Arlott <simon@fire.lp0.eu>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Cc: Maxime Ripard <maxime.ripard@free-electrons.com>
      Cc: devicetree-discuss@lists.ozlabs.org
      Cc: linux-doc@vger.kernel.org
      5698c50d
    • J
      metag: Time keeping · a2c5d4ed
      James Hogan 提交于
      Add time keeping code for metag. Meta hardware threads have 2 timers.
      The background timer (TXTIMER) is used as a free-running time base, and
      the interrupt timer (TXTIMERI) is used for the timer interrupt. Both
      counters traditionally count at approximately 1MHz.
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Cc: John Stultz <johnstul@us.ibm.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      a2c5d4ed
    • J
      metag: ptrace · bc3966bf
      James Hogan 提交于
      The ptrace interface for metag provides access to some core register
      sets using the PTRACE_GETREGSET and PTRACE_SETREGSET operations. The
      details of the internal context structures is abstracted into user API
      structures to both ease use and allow flexibility to change the internal
      context layouts. Copyin and copyout functions for these register sets
      are exposed to allow signal handling code to use them to copy to and
      from the signal context.
      
      struct user_gp_regs (NT_PRSTATUS) provides access to the core general
      purpose register context.
      
      struct user_cb_regs (NT_METAG_CBUF) provides access to the TXCATCH*
      registers which contains information abuot a memory fault, unaligned
      access error or watchpoint. This can be modified to alter the way the
      fault is replayed on resume ("catch replay"), or to prevent the replay
      taking place.
      
      struct user_rp_state (NT_METAG_RPIPE) provides access to the state of
      the Meta read pipeline which can be used to hide memory latencies in
      hand optimised data loops.
      
      Extended DSP register state, DSP RAM, and hardware breakpoint registers
      aren't yet exposed through ptrace.
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Denys Vlasenko <vda.linux@googlemail.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Tony Lindgren <tony@atomide.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      bc3966bf
    • J
      asm-generic/unistd.h: handle symbol prefixes in cond_syscall · 4dd3c959
      James Hogan 提交于
      Some architectures have symbol prefixes and set CONFIG_SYMBOL_PREFIX,
      but this wasn't taken into account by the generic cond_syscall. It's
      easy enough to fix in a generic fashion, so add the symbol prefix to
      symbol names in cond_syscall when CONFIG_SYMBOL_PREFIX is set.
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Acked-by: NArnd Bergmann <arnd@arndb.de>
      Acked-by: NMike Frysinger <vapier@gentoo.org>
      4dd3c959
    • J
      asm-generic/io.h: check CONFIG_VIRT_TO_BUS · c93d0312
      James Hogan 提交于
      Make asm-generic/io.h check CONFIG_VIRT_TO_BUS before defining
      virt_to_bus() and bus_to_virt(), otherwise it's easy to accidentally
      have a silently failing incorrect direct mapped definition rather then
      no definition at all.
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Acked-by: NArnd Bergmann <arnd@arndb.de>
      c93d0312
    • Y
      x86, ACPI, mm: Revert movablemem_map support · 20e6926d
      Yinghai Lu 提交于
      Tim found:
      
        WARNING: at arch/x86/kernel/smpboot.c:324 topology_sane.isra.2+0x6f/0x80()
        Hardware name: S2600CP
        sched: CPU #1's llc-sibling CPU #0 is not on the same node! [node: 1 != 0]. Ignoring dependency.
        smpboot: Booting Node   1, Processors  #1
        Modules linked in:
        Pid: 0, comm: swapper/1 Not tainted 3.9.0-0-generic #1
        Call Trace:
          set_cpu_sibling_map+0x279/0x449
          start_secondary+0x11d/0x1e5
      
      Don Morris reproduced on a HP z620 workstation, and bisected it to
      commit e8d19552 ("acpi, memory-hotplug: parse SRAT before memblock
      is ready")
      
      It turns out movable_map has some problems, and it breaks several things
      
      1. numa_init is called several times, NOT just for srat. so those
      	nodes_clear(numa_nodes_parsed)
      	memset(&numa_meminfo, 0, sizeof(numa_meminfo))
         can not be just removed.  Need to consider sequence is: numaq, srat, amd, dummy.
         and make fall back path working.
      
      2. simply split acpi_numa_init to early_parse_srat.
         a. that early_parse_srat is NOT called for ia64, so you break ia64.
         b.  for (i = 0; i < MAX_LOCAL_APIC; i++)
      	     set_apicid_to_node(i, NUMA_NO_NODE)
           still left in numa_init. So it will just clear result from early_parse_srat.
           it should be moved before that....
         c.  it breaks ACPI_TABLE_OVERIDE...as the acpi table scan is moved
             early before override from INITRD is settled.
      
      3. that patch TITLE is total misleading, there is NO x86 in the title,
         but it changes critical x86 code. It caused x86 guys did not
         pay attention to find the problem early. Those patches really should
         be routed via tip/x86/mm.
      
      4. after that commit, following range can not use movable ram:
        a. real_mode code.... well..funny, legacy Node0 [0,1M) could be hot-removed?
        b. initrd... it will be freed after booting, so it could be on movable...
        c. crashkernel for kdump...: looks like we can not put kdump kernel above 4G
      	anymore.
        d. init_mem_mapping: can not put page table high anymore.
        e. initmem_init: vmemmap can not be high local node anymore. That is
           not good.
      
      If node is hotplugable, the mem related range like page table and
      vmemmap could be on the that node without problem and should be on that
      node.
      
      We have workaround patch that could fix some problems, but some can not
      be fixed.
      
      So just remove that offending commit and related ones including:
      
       f7210e6c ("mm/memblock.c: use CONFIG_HAVE_MEMBLOCK_NODE_MAP to
          protect movablecore_map in memblock_overlaps_region().")
      
       01a178a9 ("acpi, memory-hotplug: support getting hotplug info from
          SRAT")
      
       27168d38 ("acpi, memory-hotplug: extend movablemem_map ranges to
          the end of node")
      
       e8d19552 ("acpi, memory-hotplug: parse SRAT before memblock is
          ready")
      
       fb06bc8e ("page_alloc: bootmem limit with movablecore_map")
      
       42f47e27 ("page_alloc: make movablemem_map have higher priority")
      
       6981ec31 ("page_alloc: introduce zone_movable_limit[] to keep
          movable limit for nodes")
      
       34b71f1e ("page_alloc: add movable_memmap kernel parameter")
      
       4d59a751 ("x86: get pg_data_t's memory from other node")
      
      Later we should have patches that will make sure kernel put page table
      and vmemmap on local node ram instead of push them down to node0.  Also
      need to find way to put other kernel used ram to local node ram.
      Reported-by: NTim Gardner <tim.gardner@canonical.com>
      Reported-by: NDon Morris <don.morris@hp.com>
      Bisected-by: NDon Morris <don.morris@hp.com>
      Tested-by: NDon Morris <don.morris@hp.com>
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Thomas Renninger <trenn@suse.de>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Tang Chen <tangchen@cn.fujitsu.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      20e6926d
  13. 02 3月, 2013 3 次提交