1. 08 3月, 2013 22 次提交
    • D
      sfc: allocate more RX buffers per page · 1648a23f
      Daniel Pieczko 提交于
      Allocating 2 buffers per page is insanely inefficient when MTU is 1500
      and PAGE_SIZE is 64K (as it usually is on POWER).  Allocate as many as
      we can fit, and choose the refill batch size at run-time so that we
      still always use a whole page at once.
      
      [bwh: Fix loop condition to allow for compound pages; rebase]
      Signed-off-by: NBen Hutchings <bhutchings@solarflare.com>
      1648a23f
    • B
      sfc: Replace efx_rx_is_last_buffer() with a flag · 179ea7f0
      Ben Hutchings 提交于
      This condition is brittle and we have lots of flags to spare.
      Signed-off-by: NBen Hutchings <bhutchings@solarflare.com>
      179ea7f0
    • D
      sfc: reuse pages to avoid DMA mapping/unmapping costs · 2768935a
      Daniel Pieczko 提交于
      On POWER systems, DMA mapping/unmapping operations are very expensive.
      These changes reduce these costs by trying to reuse DMA mapped pages.
      
      After all the buffers associated with a page have been processed and
      passed up, the page is placed into a ring (if there is room).  For
      each page that is required for a refill operation, a page in the ring
      is examined to determine if its page count has fallen to 1, ie. the
      kernel has released its reference to these packets.  If this is the
      case, the page can be immediately added back into the RX descriptor
      ring, without having to re-map it for DMA.
      
      If the kernel is still holding a reference to this page, it is removed
      from the ring and unmapped for DMA.  Then a new page, which can
      immediately be used by RX buffers in the descriptor ring, is allocated
      and DMA mapped.
      
      The time a page needs to spend in the recycle ring before the kernel
      has released its page references is based on the number of buffers
      that use this page.  As large pages can hold more RX buffers, the RX
      recycle ring can be shorter.  This reduces memory usage on POWER
      systems, while maintaining the performance gain achieved by recycling
      pages, following the driver change to pack more than two RX buffers
      into large pages.
      
      When an IOMMU is not present, the recycle ring can be small to reduce
      memory usage, since DMA mapping operations are inexpensive.
      
      With a small recycle ring, attempting to refill the descriptor queue
      with more buffers than the equivalent size of the recycle ring could
      ultimately lead to memory leaks if page entries in the recycle ring
      were overwritten.  To prevent this, the check to see if the recycle
      ring is full is changed to check if the next entry to be written is
      NULL.
      
      [bwh: Combine and rebase several commits so this is complete
       before the following buffer-packing changes.  Remove module
       parameter.]
      Signed-off-by: NBen Hutchings <bhutchings@solarflare.com>
      2768935a
    • B
      sfc: Enable RX DMA scattering where possible · 85740cdf
      Ben Hutchings 提交于
      Enable RX DMA scattering iff an RX buffer large enough for the current
      MTU will not fit into a single page and the NIC supports DMA
      scattering for kernel-mode RX queues.
      
      On Falcon and Siena, the RX_USR_BUF_SIZE field is used as the DMA
      limit for both all RX queues with scatter enabled.  Set it to 1824,
      matching what Onload uses now.
      
      Maintain a statistic for frames truncated due to lack of descriptors
      (rx_nodesc_trunc).  This is distinct from rx_frm_trunc which may be
      incremented when scattering is disabled and implies an over-length
      frame.
      
      Whenever an MTU change causes scattering to be turned on or off,
      update filters that point to the PF queues, but leave others
      unchanged, as VF drivers assume scattering is off.
      
      Add n_frags parameters to various functions, and make them iterate:
      - efx_rx_packet()
      - efx_recycle_rx_buffers()
      - efx_rx_mk_skb()
      - efx_rx_deliver()
      
      Make efx_handle_rx_event() responsible for updating
      efx_rx_queue::removed_count.
      
      Change the RX pipeline state to a starting ring index and number of
      fragments, and make __efx_rx_packet() responsible for clearing it.
      
      Based on earlier versions by David Riddoch and Jon Cooper.
      Signed-off-by: NBen Hutchings <bhutchings@solarflare.com>
      85740cdf
    • B
      sfc: Update RX buffer address together with length · b74e3e8c
      Ben Hutchings 提交于
      Adjust rx_buf->page_offset when we eat the RX hash prefix.  Remove
      efx_rx_buf_offset(), which is now redundant.
      Signed-off-by: NBen Hutchings <bhutchings@solarflare.com>
      b74e3e8c
    • B
      sfc: Explicitly prefetch RX hash prefix, not just Ethernet heade · 5036b7c7
      Ben Hutchings 提交于
      Currently we prefetch from the Ethernet header, but we will also read
      the hash prefix.  In practice they should be in the same cache line
      and this won't hurt, but it is still pointless to add on the hash
      prefix size.
      Signed-off-by: NBen Hutchings <bhutchings@solarflare.com>
      5036b7c7
    • B
      sfc: Replace efx_rx_buf_eh() with simpler efx_rx_buf_va() · b184f16b
      Ben Hutchings 提交于
      efx_rx_buf_va() returns the virtual address of the current start of
      the buffer.  The callers must add the hash prefix size themselves.
      Signed-off-by: NBen Hutchings <bhutchings@solarflare.com>
      b184f16b
    • B
      sfc: Wrap __efx_rx_packet() with efx_rx_flush_packet() · ff734ef4
      Ben Hutchings 提交于
      The pipeline mechanism will need to change a bit for scattered
      packets.  Add a wrapper to insulate efx_process_channel() from this.
      Signed-off-by: NBen Hutchings <bhutchings@solarflare.com>
      ff734ef4
    • B
    • B
      sfc: Properly distinguish RX buffer and DMA lengths · 272baeeb
      Ben Hutchings 提交于
      Replace efx_nic::rx_buffer_len with efx_nic::rx_dma_len, the maximum
      RX DMA length.
      Signed-off-by: NBen Hutchings <bhutchings@solarflare.com>
      272baeeb
    • B
    • A
      sfc: Add AER and EEH support for Siena · 626950db
      Alexandre Rames 提交于
      The Linux side of EEH is triggered by MMIO reads, but this
      driver's data path does not issue any MMIO reads (except in
      legacy interrupt mode).  Therefore add a monitor function
      to poll EEH periodically.
      
      When preparing to reset the device based on our own error
      detection, also poll EEH and defer to its recovery mechanism
      if appropriate.
      
      [bwh: Use a separate condition for the initial link poll; fix some
       style errors]
      Signed-off-by: NBen Hutchings <bhutchings@solarflare.com>
      626950db
    • B
      sfc: Disable RSS when using SR-IOV and only 1 RX queue on the PF · 634ab72c
      Ben Hutchings 提交于
      On Siena, VFs share RSS configuration with the PF.  We attempted to
      support configurations where the PF only uses 1 RX queue and VFs use
      multiple RX queues, by (1) setting up RSS for the number of RX queues
      per VF (2) disabling RSS in the PF's RX default filters.
      
      Unfortunately commit cd2d5b52 ('sfc: Add SR-IOV back-end support
      for SFC9000 family') only included (1).  This is (2).
      Signed-off-by: NBen Hutchings <bhutchings@solarflare.com>
      634ab72c
    • B
      sfc: Fix replacement detection in efx_filter_insert_filter() · d9ccfdd4
      Ben Hutchings 提交于
      efx_filter_insert_filter() uses the first table entry in the hash chain
      that either has the same match values or is empty.  This means that
      replacement doesn't always work correctly:
      
      1. Insert filter F1 with match values M1, hashing to H1, at first
         possible entry E1.
      2. Insert filter F2 with match values M2, hashing to H1, at second
         possible entry E2.
      3. Remove filter F1.
      4. Insert filter F3 with match values M2, hashing to H1, at first
         possible entry E1.
      
      F3 should have either replaced F2 or been rejected (depending on
      priority and the replace_equal parameter).
      
      Instead, search for both a matching filter that the inserted filter
      would replace, and an available insertion point, up to the applicable
      maximum search depths.  If we insert at lower depth than a replaced
      filter, clear the replaced filter.
      Signed-off-by: NBen Hutchings <bhutchings@solarflare.com>
      d9ccfdd4
    • B
      sfc: Merge efx_filter_search() into efx_filter_insert() · 297891ce
      Ben Hutchings 提交于
      efx_filter_search() is only called from efx_filter_insert(), and
      neither function is very long.  The following bug fix requires a more
      sophisticated search with a third result, which is going to be easier
      to implement as part of the same function.
      Signed-off-by: NBen Hutchings <bhutchings@solarflare.com>
      297891ce
    • B
      sfc: Don't use efx_filter_{build,hash,increment}() for default MAC filters · 385904f8
      Ben Hutchings 提交于
      These functions happen to work for default MAC filters: they generate
      an initial index of 1/0 for unicast/multicast respectively and an
      increment of 1 for either, so a search succeeds at depth 2.  But this
      is a matter of luck rather than design, and it really won't work well
      with the bug fix we're about to do.
      Signed-off-by: NBen Hutchings <bhutchings@solarflare.com>
      385904f8
    • B
      sfc: Remove redundant parameter to efx_filter_search() · e3a699fa
      Ben Hutchings 提交于
      The 'for_insert' parameter is redundant since there are no longer
      any other operations that need to search based on a filter spec.
      Signed-off-by: NBen Hutchings <bhutchings@solarflare.com>
      e3a699fa
    • B
      sfc: More sensible semantics for efx_filter_insert_filter() replace flag · 7de07a4d
      Ben Hutchings 提交于
      The 'replace' flag to efx_filter_insert_filter() controls whether the
      new filter may replace *any* filter, and is checked even before
      priority comparison.  But lower-priority filters should never
      block insertion of higher-priority filters.
      
      Change the priority checking so that lower-priority filters are
      replaced regardless of the value of the flag, and rename the
      flag to 'replace_equal'.
      Signed-off-by: NBen Hutchings <bhutchings@solarflare.com>
      7de07a4d
    • A
      sfc: Remove rx_alloc_method SKB · 97d48a10
      Alexandre Rames 提交于
      [bwh: Remove more dead code, and make efx_ptp_rx() pull the data it
       needs into the header area.]
      Signed-off-by: NBen Hutchings <bhutchings@solarflare.com>
      97d48a10
    • L
    • L
      sfc: PTP changes to support improved UUID filtering mode · c939a316
      Laurence Evans 提交于
      There is a long-standing problem with the packet-timestamp matching in
      the driver. When a PTP packet is received by the MC, the FPGA
      timestamps the packet and the MC sends the timestamp and 6 bytes of
      the UUID to the driver. The driver then matches the timestamp against
      received packets using the same 6 bytes of UUID.
      
      The problem comes from the choice of which 6 bytes to use. The PTP
      spec is slightly contradictory and misleading in one of the two places
      where the UUIDs are discussed. From section 7.2.2.2 of the spec, a
      PTPD2 UUID can be either a EUI-64 or a EUI-64 constructed from a
      EUI-48. The typical ethernet based implementation uses a EUI-64
      constructed from a EUI-48. This works by taking the first 3 bytes of
      the MAC address of the NIC being used for PTP (the OUI), then
      inserting 0xFF, 0xFE, then taking the last 3 bytes of the MAC address
      giving
                MAC[0], MAC[1], MAC[2], 0xFF, 0xFE, MAC[3], MAC[4], MAC[5]
      The current MC firmware and driver discard the first two bytes of this
      UUID and packets are matched against timestamps using bytes 2 to 7 so
      there is a small risk that in a deployment of Solarflare PTP NICs used
      with other vendors NICs, that a PTP packet could be matched against
      the wrong timestamp. This applies to all other organisations whose
      third byte of the OUI is 0x53. It's a long list but I notice that it
      includes Cisco.
      
      The necessary modifications to use bytes 0-2 and 5-7 of the UUID to
      match against are quite small but introduce incompatibility between
      older version of the firmware and driver.
      
      When PTP is enabled via SO_TIMESTAMPING specifying PTP V2, the driver
      will try to enable PTP in the firmware using the enhanced mode
      (above). If the firmware returns an error, the driver will enable PTP
      in the firmware using the old mode.
      
      [bwh: Fix some style errors; remove private ioctl bits]
      Signed-off-by: NBen Hutchings <bhutchings@solarflare.com>
      c939a316
    • B
      sfc: Allow efx_channel_type::receive_skb() to reject a packet · 4a74dc65
      Ben Hutchings 提交于
      Instead of having efx_ptp_rx() call netif_receive_skb() for an invalid
      PTP packet, make it return false for rejected packets and have
      efx_rx_deliver() pass them up.
      Signed-off-by: NBen Hutchings <bhutchings@solarflare.com>
      4a74dc65
  2. 07 3月, 2013 5 次提交
  3. 06 3月, 2013 4 次提交
  4. 05 3月, 2013 4 次提交
    • C
      drivers/tty/hvc: Use strlcpy instead of strncpy · 9276dfd2
      Chen Gang 提交于
      when strlen pi->location_code is larger than HVCS_CLC_LENGTH + 1,
          original implementation can not let hvcsd->p_location_code NUL terminated.
        so need fix it (also can simplify the code)
      Signed-off-by: NChen Gang <gang.chen@asianux.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      9276dfd2
    • R
      hw_random: make buffer usable in scatterlist. · f7f154f1
      Rusty Russell 提交于
      virtio_rng feeds the randomness buffer handed by the core directly
      into the scatterlist, since commit bb347d98.
      
      However, if CONFIG_HW_RANDOM=m, the static buffer isn't a linear address
      (at least on most archs).  We could fix this in virtio_rng, but it's actually
      far easier to just do it in the core as virtio_rng would have to allocate
      a buffer every time (it doesn't know how much the core will want to read).
      Reported-by: NAurelien Jarno <aurelien@aurel32.net>
      Tested-by: NAurelien Jarno <aurelien@aurel32.net>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Cc: stable@kernel.org
      f7f154f1
    • F
      net: fec: fix build error in no MXC platform · acac8406
      Frank Li 提交于
      build error cause by
      Commit ff43da86
      ("NET: FEC: dynamtic check DMA desc buff type")
      
      drivers/net/ethernet/freescale/fec.c: In function ‘fec_enet_get_nextdesc’:
      drivers/net/ethernet/freescale/fec.c:215:18: error: invalid use of undefined type ‘struct bufdesc_ex’
      drivers/net/ethernet/freescale/fec.c: In function ‘fec_enet_get_prevdesc’:
      drivers/net/ethernet/freescale/fec.c:224:18: error: invalid use of undefined type ‘struct bufdesc_ex’
      drivers/net/ethernet/freescale/fec.c: In function ‘fec_enet_start_xmit’:
      drivers/net/ethernet/freescale/fec.c:286:37: error: arithmetic on pointer to an incomplete type
      drivers/net/ethernet/freescale/fec.c:287:13: error: arithmetic on pointer to an incomplete type
      drivers/net/ethernet/freescale/fec.c:324:7: error: dereferencing pointer to incomplete type etc....
      Signed-off-by: NFrank Li <Frank.Li@freescale.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      acac8406
    • F
      net: fec: put tx to napi poll function to fix dead lock · de5fb0a0
      Frank Li 提交于
      up stack ndo_start_xmit already hold lock.
      fec_enet_start_xmit needn't spin lock.
      stat_xmit just update fep->cur_tx
      fec_enet_tx just update fep->dirty_tx
      
      Reserve a empty bdb to check full or empty
      cur_tx == dirty_tx    means full
      cur_tx == dirty_tx +1 means empty
      
      So needn't is_full variable.
      
      Fix spin lock deadlock
      
      =================================
      [ INFO: inconsistent lock state ]
      3.8.0-rc5+ #107 Not tainted
      ---------------------------------
      inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
      ptp4l/615 [HC1[1]:SC0[0]:HE0:SE1] takes:
       (&(&list->lock)->rlock#3){?.-...}, at: [<8042c3c4>] skb_queue_tail+0x20/0x50
       {HARDIRQ-ON-W} state was registered at:
       [<80067250>] mark_lock+0x154/0x4e8
       [<800676f4>] mark_irqflags+0x110/0x1a4
       [<80069208>] __lock_acquire+0x494/0x9c0
       [<80069ce8>] lock_acquire+0x90/0xa4
       [<80527ad0>] _raw_spin_lock_bh+0x44/0x54
       [<804877e0>] first_packet_length+0x38/0x1f0
       [<804879e4>] udp_poll+0x4c/0x5c
       [<804231f8>] sock_poll+0x24/0x28
       [<800d27f0>] do_poll.isra.10+0x120/0x254
       [<800d36e4>] do_sys_poll+0x15c/0x1e8
       [<800d3828>] sys_poll+0x60/0xc8
       [<8000e780>] ret_fast_syscall+0x0/0x3c
      
       *** DEADLOCK ***
      
       1 lock held by ptp4l/615:
        #0:  (&(&fep->hw_lock)->rlock){-.-...}, at: [<80355f9c>] fec_enet_tx+0x24/0x268
        stack backtrace:
        Backtrace:
        [<800121e0>] (dump_backtrace+0x0/0x10c) from [<80516210>] (dump_stack+0x18/0x1c)
        r6:8063b1fc r5:bf38b2f8 r4:bf38b000 r3:bf38b000
        [<805161f8>] (dump_stack+0x0/0x1c) from [<805189d0>] (print_usage_bug.part.34+0x164/0x1a4)
        [<8051886c>] (print_usage_bug.part.34+0x0/0x1a4) from [<80518a88>] (print_usage_bug+0x78/0x88)
        r8:80065664 r7:bf38b2f8 r6:00000002 r5:00000000 r4:bf38b000
        [<80518a10>] (print_usage_bug+0x0/0x88) from [<80518b58>] (mark_lock_irq+0xc0/0x270)
        r7:bf38b000 r6:00000002 r5:bf38b2f8 r4:00000000
        [<80518a98>] (mark_lock_irq+0x0/0x270) from [<80067270>] (mark_lock+0x174/0x4e8)
        [<800670fc>] (mark_lock+0x0/0x4e8) from [<80067744>] (mark_irqflags+0x160/0x1a4)
        [<800675e4>] (mark_irqflags+0x0/0x1a4) from [<80069208>] (__lock_acquire+0x494/0x9c0)
        r5:00000002 r4:bf38b2f8
        [<80068d74>] (__lock_acquire+0x0/0x9c0) from [<80069ce8>] (lock_acquire+0x90/0xa4)
        [<80069c58>] (lock_acquire+0x0/0xa4) from [<805278d8>] (_raw_spin_lock_irqsave+0x4c/0x60)
        [<8052788c>] (_raw_spin_lock_irqsave+0x0/0x60) from [<8042c3c4>] (skb_queue_tail+0x20/0x50)
        r6:bfbb2180 r5:bf1d0190 r4:bf1d0184
        [<8042c3a4>] (skb_queue_tail+0x0/0x50) from [<8042c4cc>] (sock_queue_err_skb+0xd8/0x188)
        r6:00000056 r5:bfbb2180 r4:bf1d0000 r3:00000000
        [<8042c3f4>] (sock_queue_err_skb+0x0/0x188) from [<8042d15c>] (skb_tstamp_tx+0x70/0xa0)
        r6:bf0dddb0 r5:bf1d0000 r4:bfbb2180 r3:00000004
        [<8042d0ec>] (skb_tstamp_tx+0x0/0xa0) from [<803561d0>] (fec_enet_tx+0x258/0x268)
        r6:c089d260 r5:00001c00 r4:bfbd0000
        [<80355f78>] (fec_enet_tx+0x0/0x268) from [<803562cc>] (fec_enet_interrupt+0xec/0xf8)
        [<803561e0>] (fec_enet_interrupt+0x0/0xf8) from [<8007d5b0>] (handle_irq_event_percpu+0x54/0x1a0)
        [<8007d55c>] (handle_irq_event_percpu+0x0/0x1a0) from [<8007d740>] (handle_irq_event+0x44/0x64)
        [<8007d6fc>] (handle_irq_event+0x0/0x64) from [<80080690>] (handle_fasteoi_irq+0xc4/0x15c)
        r6:bf0dc000 r5:bf811290 r4:bf811240 r3:00000000
        [<800805cc>] (handle_fasteoi_irq+0x0/0x15c) from [<8007ceec>] (generic_handle_irq+0x28/0x38)
        r5:807130c8 r4:00000096
        [<8007cec4>] (generic_handle_irq+0x0/0x38) from [<8000f16c>] (handle_IRQ+0x54/0xb4)
        r4:8071d280 r3:00000180
        [<8000f118>] (handle_IRQ+0x0/0xb4) from [<80008544>] (gic_handle_irq+0x30/0x64)
        r8:8000e924 r7:f4000100 r6:bf0ddef8 r5:8071c974 r4:f400010c
        r3:00000000
        [<80008514>] (gic_handle_irq+0x0/0x64) from [<8000e2e4>] (__irq_svc+0x44/0x5c)
        Exception stack(0xbf0ddef8 to 0xbf0ddf40)
      Signed-off-by: NFrank Li <Frank.Li@freescale.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      de5fb0a0
  5. 03 3月, 2013 5 次提交
    • F
      ax88179_178a: ASIX AX88179_178A USB 3.0/2.0 to gigabit ethernet adapter driver · e2ca90c2
      Freddy Xin 提交于
      This is a resubmission.
      Added kfree() in ax88179_get_eeprom to prevent memory leakage.
      Modified "__le16 rxctl" to "u16 rxctl" in "struct ax88179_data" and removed pointless casts.
      Removed asix_init and asix_exit functions and added "module_usb_driver(ax88179_178a_driver)".
      Fixed endianness issue on big endian systems and verified this driver on iBook G4.
      Removed steps that change net->features in ax88179_set_features function.
      Added "const" to ethtool_ops structure and fixed the coding style of AX88179_BULKIN_SIZE array.
      Fixed the issue that the default MTU is not 1500.
      Added ax88179_change_mtu function and enabled the hardware jumbo frame function to support an
      MTU higher than 1500.
      Fixed indentation and empty line coding style errors.
      The _nopm version usb functions were added to access register in suspend and resume functions.
      Serveral variables allocted dynamically were removed and replaced by stack variables.
      ax88179_get_eeprom were modified from asix_get_eeprom in asix_common.
      
      This patch adds a driver for ASIX's AX88179 family of USB 3.0/2.0
      to gigabit ethernet adapters. It's based on the AX88xxx driver but
      the usb commands used to access registers for AX88179 are completely different.
      This driver had been verified on x86 system with AX88179/AX88178A and
      Sitcomm LN-032 USB dongles.
      Signed-off-by: NFreddy Xin <freddy@asix.com.tw>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e2ca90c2
    • J
      metag: Internal and external irqchips · 5698c50d
      James Hogan 提交于
      Meta core internal interrupts (from HWSTATMETA and friends) are vectored
      onto the TR1 core trigger for the current thread. This is demultiplexed
      in irq-metag.c to individual Linux IRQs for each internal interrupt.
      
      External SoC interrupts (from HWSTATEXT and friends) are vectored onto
      the TR2 core trigger for the current thread. This is demultiplexed in
      irq-metag-ext.c to individual Linux IRQs for each external SoC interrupt.
      The external irqchip has devicetree bindings for configuring the number
      of irq banks and the type of masking available.
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Grant Likely <grant.likely@secretlab.ca>
      Cc: Rob Herring <rob.herring@calxeda.com>
      Cc: Rob Landley <rob@landley.net>
      Cc: Dom Cobley <popcornmix@gmail.com>
      Cc: Simon Arlott <simon@fire.lp0.eu>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Cc: Maxime Ripard <maxime.ripard@free-electrons.com>
      Cc: devicetree-discuss@lists.ozlabs.org
      Cc: linux-doc@vger.kernel.org
      5698c50d
    • J
      metag: Time keeping · a2c5d4ed
      James Hogan 提交于
      Add time keeping code for metag. Meta hardware threads have 2 timers.
      The background timer (TXTIMER) is used as a free-running time base, and
      the interrupt timer (TXTIMERI) is used for the timer interrupt. Both
      counters traditionally count at approximately 1MHz.
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Cc: John Stultz <johnstul@us.ibm.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      a2c5d4ed
    • H
      parisc: check return value of down_interruptible() in hp_sdc_rtc.c · 15fb9683
      Helge Deller 提交于
      additionally comment out unused code (which may be used later)
      Signed-off-by: NHelge Deller <deller@gmx.de>
      15fb9683
    • Y
      x86, ACPI, mm: Revert movablemem_map support · 20e6926d
      Yinghai Lu 提交于
      Tim found:
      
        WARNING: at arch/x86/kernel/smpboot.c:324 topology_sane.isra.2+0x6f/0x80()
        Hardware name: S2600CP
        sched: CPU #1's llc-sibling CPU #0 is not on the same node! [node: 1 != 0]. Ignoring dependency.
        smpboot: Booting Node   1, Processors  #1
        Modules linked in:
        Pid: 0, comm: swapper/1 Not tainted 3.9.0-0-generic #1
        Call Trace:
          set_cpu_sibling_map+0x279/0x449
          start_secondary+0x11d/0x1e5
      
      Don Morris reproduced on a HP z620 workstation, and bisected it to
      commit e8d19552 ("acpi, memory-hotplug: parse SRAT before memblock
      is ready")
      
      It turns out movable_map has some problems, and it breaks several things
      
      1. numa_init is called several times, NOT just for srat. so those
      	nodes_clear(numa_nodes_parsed)
      	memset(&numa_meminfo, 0, sizeof(numa_meminfo))
         can not be just removed.  Need to consider sequence is: numaq, srat, amd, dummy.
         and make fall back path working.
      
      2. simply split acpi_numa_init to early_parse_srat.
         a. that early_parse_srat is NOT called for ia64, so you break ia64.
         b.  for (i = 0; i < MAX_LOCAL_APIC; i++)
      	     set_apicid_to_node(i, NUMA_NO_NODE)
           still left in numa_init. So it will just clear result from early_parse_srat.
           it should be moved before that....
         c.  it breaks ACPI_TABLE_OVERIDE...as the acpi table scan is moved
             early before override from INITRD is settled.
      
      3. that patch TITLE is total misleading, there is NO x86 in the title,
         but it changes critical x86 code. It caused x86 guys did not
         pay attention to find the problem early. Those patches really should
         be routed via tip/x86/mm.
      
      4. after that commit, following range can not use movable ram:
        a. real_mode code.... well..funny, legacy Node0 [0,1M) could be hot-removed?
        b. initrd... it will be freed after booting, so it could be on movable...
        c. crashkernel for kdump...: looks like we can not put kdump kernel above 4G
      	anymore.
        d. init_mem_mapping: can not put page table high anymore.
        e. initmem_init: vmemmap can not be high local node anymore. That is
           not good.
      
      If node is hotplugable, the mem related range like page table and
      vmemmap could be on the that node without problem and should be on that
      node.
      
      We have workaround patch that could fix some problems, but some can not
      be fixed.
      
      So just remove that offending commit and related ones including:
      
       f7210e6c ("mm/memblock.c: use CONFIG_HAVE_MEMBLOCK_NODE_MAP to
          protect movablecore_map in memblock_overlaps_region().")
      
       01a178a9 ("acpi, memory-hotplug: support getting hotplug info from
          SRAT")
      
       27168d38 ("acpi, memory-hotplug: extend movablemem_map ranges to
          the end of node")
      
       e8d19552 ("acpi, memory-hotplug: parse SRAT before memblock is
          ready")
      
       fb06bc8e ("page_alloc: bootmem limit with movablecore_map")
      
       42f47e27 ("page_alloc: make movablemem_map have higher priority")
      
       6981ec31 ("page_alloc: introduce zone_movable_limit[] to keep
          movable limit for nodes")
      
       34b71f1e ("page_alloc: add movable_memmap kernel parameter")
      
       4d59a751 ("x86: get pg_data_t's memory from other node")
      
      Later we should have patches that will make sure kernel put page table
      and vmemmap on local node ram instead of push them down to node0.  Also
      need to find way to put other kernel used ram to local node ram.
      Reported-by: NTim Gardner <tim.gardner@canonical.com>
      Reported-by: NDon Morris <don.morris@hp.com>
      Bisected-by: NDon Morris <don.morris@hp.com>
      Tested-by: NDon Morris <don.morris@hp.com>
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Thomas Renninger <trenn@suse.de>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Tang Chen <tangchen@cn.fujitsu.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      20e6926d