1. 31 1月, 2008 22 次提交
    • S
      firewire: enforce access order between generation and node ID, fix "giving up on config rom" · b5d2a5e0
      Stefan Richter 提交于
      fw_device.node_id and fw_device.generation are accessed without mutexes.
      We have to ensure that all readers will get to see node_id updates
      before generation updates.
      
      Fixes an inability to recognize devices after "giving up on config rom",
      https://bugzilla.redhat.com/show_bug.cgi?id=429950Signed-off-by: NStefan Richter <stefanr@s5r6.in-berlin.de>
      
      Reviewed by Nick Piggin <nickpiggin@yahoo.com.au>.
      
      Verified to fix 'giving up on config rom' issues on multiple system and
      drive combinations that were previously affected.
      Signed-off-by: NJarod Wilson <jwilson@redhat.com>
      Signed-off-by: NKristian Høgsberg <krh@redhat.com>
      b5d2a5e0
    • S
      firewire: fw-cdev: use device generation, not card generation · cf5a56ac
      Stefan Richter 提交于
      We have to use the fw_device.generation here, not the fw_card.generation,
      because the generation must never be newer than the node ID when we emit
      a transaction.  This cannot be guaranteed with fw_card.generation.
      Signed-off-by: NStefan Richter <stefanr@s5r6.in-berlin.de>
      
      Verified in concert with subsequent memory barriers patch to fix 'giving
      up on config rom' issues on multiple system and drive combinations that
      were previously affected.
      Signed-off-by: NJarod Wilson <jwilson@redhat.com>
      cf5a56ac
    • S
      firewire: fw-sbp2: use device generation, not card generation · 5a8a1bcd
      Stefan Richter 提交于
      There was a small window where a login or reconnect job could use an
      already updated card generation with an outdated node ID.  We have to
      use the fw_device.generation here, not the fw_card.generation, because
      the generation must never be newer than the node ID when we emit a
      transaction.  This cannot be guaranteed with fw_card.generation.
      
      Furthermore, the target's and initiator's node IDs can be obtained from
      fw_device and fw_card.  Dereferencing their underlying topology objects
      is not necessary.
      Signed-off-by: NStefan Richter <stefanr@s5r6.in-berlin.de>
      
      Verified in concert with subsequent memory barriers patch to fix 'giving
      up on config rom' issues on multiple system and drive combinations that
      were previously affected.
      Signed-off-by: NJarod Wilson <jwilson@redhat.com>
      5a8a1bcd
    • S
      firewire: fw-sbp2: try to increase reconnect_hold (speed up reconnection) · 14dc992a
      Stefan Richter 提交于
      Ask the target to grant 4 seconds instead of the standard and minimum of
      1 second window after bus reset for reconnection.  This accelerates
      reconnection if there are more than one targets on the bus:  If a login
      and inquiry to one target blocks the fw-sbp2 workqueue for more than 1s
      after bus reset, we now still can reconnect to the other target.
      
      Before that, fw-sbp2's reconnect attempts would be rejected with "error
      status: 0:9" (function rejected), and fw-sbp2 would finally re-login.
      All those futile reconnect attemps cost extra time until the target
      which needs re-login is ready for I/O again.
      
      The reconnect timeout field in the login ORB doesn't have to be honored
      by the target though.  I found that we could get up to
        - allegedly 32768s from an old OXFW911 firmware
        - 256s from LSI bridges
        - 4s from OXUF922 and OXFW912 bridges,
        - 2s from TI bridges,
        - only the standard 1s from Initio and Prolific bridges and from
          Apple OpenFirmware in target mode.
      
      We just try to get 4 seconds which already covers the case of a few
      HDDs on the same bus quite nicely.
      
      A minor drawback occurs in the following (rare and impractical) border
      case:
        - two initiators are there, initiator 1 holds an exclusive login to
          a target,
        - initiator 1 goes off the bus,
        - target refuses login attempts from initiator 2 until reconnect_hold
          seconds after bus reset.
      
      An alternative approach to the issue at hand would be to parallelize
      fw-sbp2's reconnect and login work.
      Signed-off-by: NStefan Richter <stefanr@s5r6.in-berlin.de>
      Acked-by: NJarod Wilson <jwilson@redhat.com>
      14dc992a
    • S
      firewire: fw-sbp2: skip unnecessary logout · 4dccd020
      Stefan Richter 提交于
      Don't attempt to send a logout ORB if the target was already unplugged
      or had its link switched off.  If two targets are attached, this
      enhances the chance to quickly reconnect to the remaining target when
      one target is plugged out.
      Signed-off-by: NStefan Richter <stefanr@s5r6.in-berlin.de>
      Acked-by: NJarod Wilson <jwilson@redhat.com>
      4dccd020
    • S
      firewire vs. ieee1394: clarify MAINTAINERS · f148e20c
      Stefan Richter 提交于
      Maintainers like to receive less mail, and submitters like to have to Cc
      less recipients.
      Signed-off-by: NStefan Richter <stefanr@s5r6.in-berlin.de>
      f148e20c
    • D
      firewire: fw-ohci: Dynamically allocate buffers for DMA descriptors · fe5ca634
      David Moore 提交于
      Previously, the fw-ohci driver used fixed-length buffers for storing
      descriptors for isochronous receive DMA programs.  If an application
      (such as libdc1394) generated a DMA program that was too large, fw-ohci
      would reach the limit of its fixed-sized buffer and return an error to
      userspace.
      
      This patch replaces the fixed-length ring-buffer with a linked-list of
      page-sized buffers.  Additional buffers can be dynamically allocated and
      appended to the list when necessary.  For a particular context, buffers
      are kept around after use and reused as necessary, so there is no
      allocation taking place after the DMA program is generated for the first
      time.
      
      In addition, the buffers it uses are coherent for DMA so there is no
      syncing required before and after writes.  This syncing wasn't properly
      done in the previous version of the code.
      
      -
      
      This is the fourth version of my patch that replaces a fixed-length
      buffer for DMA descriptors with a dynamically allocated linked-list of
      buffers.
      
      As we discovered with the last attempt, new context programs are
      sometimes queued from interrupt context, making it unacceptable to call
      tasklet_disable() from context_get_descriptors().
      
      This version of the patch uses ohci->lock for all locking needs instead
      of tasklet_disable/enable.  There is a new requirement that
      context_get_descriptors() be called while holding ohci->lock.  It was
      already held for the AT context, so adding the requirement for the iso
      context did not seem particularly onerous.  In addition, this has the
      side benefit of allowing iso queue to be safely called from concurrent
      user-space threads, which previously was not safe.
      Signed-off-by: NDavid Moore <dcm@acm.org>
      Signed-off-by: NKristian Høgsberg <krh@redhat.com>
      Signed-off-by: NJarod Wilson <jwilson@redhat.com>
      
      -
      
      Fixes the following issues:
        - Isochronous reception stopped prematurely if an application used a
          larger buffer.  (Reproduced with coriander.)
        - Isochronous reception stopped after one or a few frames on VT630x
          in OHCI 1.0 mode.  (Fixes reception in coriander, but dvgrab still
          doesn't work with these chips.)
      
      Patch update: struct member alignment, whitespace nits
      Signed-off-by: NStefan Richter <stefanr@s5r6.in-berlin.de>
      fe5ca634
    • S
      firewire: fw-ohci: CycleTooLong interrupt management · bb9f2206
      Stefan Richter 提交于
      The firewire-ohci driver so far lacked the ability to resume cycle
      master duty after that condition happened, as added to ohci1394 in Linux
      2.6.18 by commit 57fdb58f.  This ports
      this patch to fw-ohci.
      
      The "cycle too long" condition has been seen in practice
        - with IIDC cameras if a mode with packets too large for a speed is
          chosen,
        - sporadically when capturing DV on a VIA VT6306 card with ohci1394/
          ieee1394/ raw1394/ dvgrab 2.
          https://bugzilla.redhat.com/show_bug.cgi?id=415841#c7
      (This does not fix Fedora bug 415841.)
      Signed-off-by: NStefan Richter <stefanr@s5r6.in-berlin.de>
      bb9f2206
    • R
      firewire: Fix extraction of source node id · 478b233e
      Rabin Vincent 提交于
      Fix extraction of the source node id from the packet header.
      Signed-off-by: NRabin Vincent <rabin@rab.in>
      Signed-off-by: NStefan Richter <stefanr@s5r6.in-berlin.de>
      478b233e
    • D
      firewire: fw-ohci: Bug fixes for packet-per-buffer support · bcee893c
      David Moore 提交于
      This patch corrects a number of bugs in the current OHCI 1.0
      packet-per-buffer support:
      
      1. Correctly deal with payloads that cross a page boundary.  The
      previous version would not split the descriptor at such a boundary,
      potentially corrupting unrelated memory.
      
      2. Allow user-space to specify multiple packets per struct
      fw_cdev_iso_packet in the same way that dual-buffer allows.  This is
      signaled by header_length being a multiple of header_size.  This
      multiple determines the number of packets.  The payload size allocated
      per packet is determined by dividing the total payload size by the
      number of packets.
      
      3. Make sync support work properly for packet-per-buffer.
      
      I have tested this patch with libdc1394 by forcing my OHCI 1.1
      controller to use the packet-per-buffer support instead of dual-buffer.
      
      I would greatly appreciate testing by those who have a DV devices and
      other types of iso streamers to make sure I didn't cause any
      regressions.
      
      Stefan, with this patch, I'm hoping that libdc1394 will work with all
      your OHCI 1.0 controllers now.
      
      The one bit of future work that remains for packet-per-buffer support is
      the automatic compaction of short payloads that I discussed with
      Kristian.
      Signed-off-by: NDavid Moore <dcm@acm.org>
      Signed-off-by: NStefan Richter <stefanr@s5r6.in-berlin.de>
      bcee893c
    • D
      firewire: fw-ohci: Fix for dualbuffer three-or-more buffers · 0642b657
      David Moore 提交于
      This patch fixes the problem where different OHCI 1.1 controllers behave
      differently when a received iso packet straddles three or more buffers
      when using the dual-buffer receive mode.  Two changes are made in order
      to handle this situation:
      
      1. The packet sync DMA descriptor is given a non-zero header length and
      non-zero payload length.  This is because zero-payload descriptors are
      not discussed in the OHCI 1.1 specs and their behavior is thus
      undefined.  Instead we use a header size just large enough for a single
      header and a payload length of 4 bytes for this first descriptor.
      
      2. As we process received packets in the context's tasklet, read the
      packet length out of the headers.  Keep track of the running total of
      the packet length as "excess_bytes", so we can ignore any descriptors
      where no packet starts or ends.  These descriptors may not have had
      their first_res_count or second_res_count fields updated by the
      controller so we cannot rely on those values.
      
      The main drawback of this patch is that the excess_bytes value might get
      "out of sync" with the packet descriptors if something strange happens
      to the DMA program.  I'm not if such a thing could ever happen, but I
      appreciate any suggestions in making it more robust.
      
      Also, the packet-per-buffer support may need a similar fix to deal with
      issue 1, but I haven't done any work on that yet.
      
      Stefan, I'm hoping that with this patch, all your OHCI 1.1 controllers
      will work properly with an unmodified version of libdc1394.
      Signed-off-by: NDavid Moore <dcm@acm.org>
      Signed-off-by: NStefan Richter <stefanr@s5r6.in-berlin.de>
      0642b657
    • S
      firewire: fw-sbp2: remove unused misleading macro · 4b11ea96
      Stefan Richter 提交于
      SBP2_MAX_SECTORS is nowhere used in fw-sbp2.
      It merely got copied over from sbp2 where it played a role in the past.
      Signed-off-by: NStefan Richter <stefanr@s5r6.in-berlin.de>
      4b11ea96
    • S
      b7811da2
    • S
      firewire: fw-sbp2: refactor workq and kref handling · 285838eb
      Stefan Richter 提交于
      This somewhat reduces the size of firewire-sbp2.ko.
      Signed-off-by: NStefan Richter <stefanr@s5r6.in-berlin.de>
      285838eb
    • S
      ieee1394: ohci1394: don't schedule IT tasklets on IR events · 85c5798b
      Stefan Richter 提交于
      Bug noted by Pieter Palmers:  Isochronous transmit tasklets were
      scheduled on isochronous receive events, in addition to the proper
      isochronous receive tasklets.
      
      http://marc.info/?l=linux1394-devel&m=119783196222802Signed-off-by: NStefan Richter <stefanr@s5r6.in-berlin.de>
      85c5798b
    • S
      ieee1394: sbp2: raise default transfer size limit · 4e6343a1
      Stefan Richter 提交于
      This patch speeds up sbp2 a little bit --- but more importantly, it
      brings the behavior of sbp2 and fw-sbp2 closer to each other.  Like
      fw-sbp2, sbp2 now does not limit the size of single transfers to 255
      sectors anymore, unless told so by a blacklist flag or by module load
      parameters.
      
      Only very old bridge chips have been known to need the 255 sectors
      limit, and we have got one such chip in our hardwired blacklist.  There
      certainly is a danger that more bridges need that limit; but I prefer to
      have this issue present in both fw-sbp2 and sbp2 rather than just one of
      them.
      
      An OXUF922 with 400GB 7200RPM disk on an S400 controller is sped up by
      this patch from 22.9 to 23.5 MB/s according to hdparm.  The same effect
      could be achieved before by setting a higher max_sectors module
      parameter.  On buses which use 1394b beta mode, sbp2 and fw-sbp2 will
      now achieve virtually the same bandwidth.  Fw-sbp2 only remains faster
      on 1394a buses due to fw-core's gap count optimization.
      Signed-off-by: NStefan Richter <stefanr@s5r6.in-berlin.de>
      4e6343a1
    • S
      ieee1394: remove unused code · 3e75b493
      Stefan Richter 提交于
      The code has been in "#if 0 - #endif" since Linux 2.6.12.
      Signed-off-by: NStefan Richter <stefanr@s5r6.in-berlin.de>
      3e75b493
    • S
      ieee1394: small cleanup after "nopage" · c7ea990f
      Stefan Richter 提交于
      Signed-off-by: NStefan Richter <stefanr@s5r6.in-berlin.de>
      c7ea990f
    • N
      ieee1394: nopage · 61db8121
      Nick Piggin 提交于
      Convert ieee1394 from nopage to fault.
      Remove redundant vma range checks (correct resource range check is retained).
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NStefan Richter <stefanr@s5r6.in-berlin.de>
      61db8121
    • J
      ieee1394: Add missing "space" · a5c52df8
      Joe Perches 提交于
      Signed-off-by: NJoe Perches <joe@perches.com>
      Signed-off-by: NStefan Richter <stefanr@s5r6.in-berlin.de>
      a5c52df8
    • S
      ieee1394: sbp2: s/g list access cosmetics · 825f1df5
      Stefan Richter 提交于
      Replace sg->length by sg_dma_len(sg).  Rename a variable for shorter
      line lengths and eliminate some superfluous local variables.
      Signed-off-by: NStefan Richter <stefanr@s5r6.in-berlin.de>
      825f1df5
    • S
      8c4ac094
  2. 30 1月, 2008 18 次提交
    • L
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86 · dd430ca2
      Linus Torvalds 提交于
      * git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86: (890 commits)
        x86: fix nodemap_size according to nodeid bits
        x86: fix overlap between pagetable with bss section
        x86: add PCI IDs to k8topology_64.c
        x86: fix early_ioremap pagetable ops
        x86: use the same pgd_list for PAE and 64-bit
        x86: defer cr3 reload when doing pud_clear()
        x86: early boot debugging via FireWire (ohci1394_dma=early)
        x86: don't special-case pmd allocations as much
        x86: shrink some ifdefs in fault.c
        x86: ignore spurious faults
        x86: remove nx_enabled from fault.c
        x86: unify fault_32|64.c
        x86: unify fault_32|64.c with ifdefs
        x86: unify fault_32|64.c by ifdef'd function bodies
        x86: arch/x86/mm/init_32.c printk fixes
        x86: arch/x86/mm/init_32.c cleanup
        x86: arch/x86/mm/init_64.c printk fixes
        x86: unify ioremap
        x86: fixes some bugs about EFI memory map handling
        x86: use reboot_type on EFI 32
        ...
      dd430ca2
    • L
      [net] Gracefully handle shared e1000/1000e driver PCI ID's · 60e23317
      Linus Torvalds 提交于
      Both the old e1000 driver and the new e1000e driver can drive some
      PCI-Express e1000 cards, and we should avoid ambiguity about which
      driver will pick up the support for those cards when both drivers are
      enabled.
      
      This solves the problem by having the old driver support those cards if
      the new driver isn't configured, but otherwise ceding support for PCI
      Express versions of the e1000 chipset to the newer driver.  Thus
      allowing both legacy configurations where only the old driver is active
      (and handles all chips it knows about) and the new configuration with
      the new driver handling the more modern PCIE variants.
      Acked-by: NJeff Garzik <jeff@garzik.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      60e23317
    • L
      Make !NETFILTER_ADVANCED enable IP6_NF_MATCH_IPV6HEADER · 44c45eb9
      Linus Torvalds 提交于
      We want IPV6HEADER matching for the non-advanced default netfilter
      configuration, since it's part of the standard netfilter setup of at
      least some distributions (eg Fedora).
      
      Otherwise NETFILTER_ADVANCED loses much of its point, since even
      non-advanced users would have to enable all the advanced options just to
      get a working IPv6 netfilter setup.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      44c45eb9
    • Y
      x86: fix nodemap_size according to nodeid bits · afadcd78
      Yinghai Lu 提交于
      memnode.map is s16 array because of nodeid is 16 bit now.
      
      so need to increase the nodemap_size according to that bits.
      Signed-off-by: NYinghai Lu <yinghai.lu@sun.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      afadcd78
    • Y
      x86: fix overlap between pagetable with bss section · 91987157
      Yinghai Lu 提交于
      one early crash on one 8 node 256g machine:
      
      Command line: console=uart8250,io,0x3f8,115200n8 initrd=kernel.org/mydisk11_x86_64.gz rw root=/dev/ram0 debug initcall_debug apic=debug acpi.debug_level=0x0000000f pci=routeirq ip=dhcp load_ramdisk=1 ramdisk_size=131072 BOOT_IMAGE=kernel.org/bzImage_2.6.25_k8.1
      BIOS-provided physical RAM map:
       BIOS-e820: 0000000000000000 - 000000000009bc00 (usable)
       BIOS-e820: 000000000009bc00 - 00000000000a0000 (reserved)
       BIOS-e820: 00000000000e6000 - 0000000000100000 (reserved)
       BIOS-e820: 0000000000100000 - 00000000dffe0000 (usable)
       BIOS-e820: 00000000dffe0000 - 00000000dffee000 (ACPI data)
       BIOS-e820: 00000000dffee000 - 00000000dffff050 (ACPI NVS)
       BIOS-e820: 00000000dffff050 - 00000000e0000000 (reserved)
       BIOS-e820: 00000000fec00000 - 00000000fec01000 (reserved)
       BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
       BIOS-e820: 00000000ff700000 - 0000000100000000 (reserved)
       BIOS-e820: 0000000100000000 - 0000004020000000 (usable)
      Early serial console at I/O port 0x3f8 (options '115200n8')
      console [uart0] enabled
      end_pfn_map = 67239936
      Kernel panic - not syncing: Duplicated early reservation d40000-e42000
      
      Pid: 0, comm: swapper Not tainted 2.6.24-smp-g5a514e21-dirty #3
      
      Call Trace:
       [<ffffffff80221545>] lapic_get_maxlvt+0x0/0x10
       [<ffffffff80221657>] clear_local_APIC+0x5/0xcf
       [<ffffffff80221726>] disable_local_APIC+0x5/0x17
       [<ffffffff8021fe16>] smp_send_stop+0x46/0x4c
       [<ffffffff80235293>] panic+0x94/0x13e
       [<ffffffff80bc3b03>] sctp_eps_proc_init+0x12/0x34
       [<ffffffff80b9f1c5>] reserve_early+0x30/0x6c
       [<ffffffff80803925>] init_memory_mapping+0x2cd/0x2dc
       [<ffffffff80b9dc01>] setup_arch+0x21f/0x44e
       [<ffffffff80b978be>] start_kernel+0x6f/0x2c7
       [<ffffffff80b971cc>] _sinittext+0x1cc/0x1d3
      
      it turns out there is overlap between pgtable and bss...
      
      in System.map we have
      ffffffff80d40420 b rsi_table
      ffffffff80d40620 B krb5_seq_lock
      ffffffff80d40628 b i.20437
      ffffffff80d40630 b xprt_rdma_inline_write_padding
      ffffffff80d40638 b sunrpc_table_header
      ffffffff80d40640 b zero
      ffffffff80d40644 b min_memreg
      ffffffff80d40648 b rpcrdma_tk_lock_g
      ffffffff80d40650 B sctp_assocs_id_lock
      ffffffff80d40658 B proc_net_sctp
      ffffffff80d40660 B sctp_assocs_id
      ffffffff80d40680 B sysctl_sctp_mem
      ffffffff80d40690 B sysctl_sctp_rmem
      ffffffff80d406a0 B sysctl_sctp_wmem
      ffffffff80d406b0 b sctp_ctl_socket
      ffffffff80d406b8 b sctp_pf_inet6_specific
      ffffffff80d406c0 b sctp_pf_inet_specific
      ffffffff80d406c8 b sctp_af_v4_specific
      ffffffff80d406d0 b sctp_af_v6_specific
      ffffffff80d406d8 b sctp_rand.33270
      ffffffff80d406dc b sctp_memory_pressure
      ffffffff80d406e0 b sctp_sockets_allocated
      ffffffff80d406e4 b sctp_memory_allocated
      ffffffff80d406e8 b sctp_sysctl_header
      ffffffff80d406f0 b zero
      ffffffff80d406f4 A __bss_stop
      ffffffff80d406f4 A _end
      
      need to round up table_start to PAGE_SIZE.
      
      also make the panic more informative.
      Signed-off-by: NYinghai Lu <yinghai.lu@sun.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      91987157
    • J
      x86: add PCI IDs to k8topology_64.c · bb4a1d64
      Joachim Deguara 提交于
      This just adds the PCI IDs of AMD's family 10h and 11h CPU's northbridges to
      k8topology discovery.
      Signed-off-by: NJoachim Deguara <joachim.deguara@amd.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Acked-by: NYinghai Lu <yinghai.lu@sun.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      bb4a1d64
    • J
      x86: fix early_ioremap pagetable ops · f6df72e7
      Jeremy Fitzhardinge 提交于
      Put appropriate pagetable update hooks in so that paravirt knows
      what's going on in there.
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      f6df72e7
    • J
      x86: use the same pgd_list for PAE and 64-bit · e3ed910d
      Jeremy Fitzhardinge 提交于
      Use a standard list threaded through page->lru for maintaining the pgd
      list on PAE.  This is the same as 64-bit, and seems saner than using a
      non-standard list via page->index.
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      e3ed910d
    • J
      x86: defer cr3 reload when doing pud_clear() · fa28ba21
      Jeremy Fitzhardinge 提交于
      PAE mode requires that we reload cr3 in order to guarantee that
      changes to the pgd will be noticed by the processor.  This means that
      in principle pud_clear needs to reload cr3 every time.  However,
      because reloading cr3 implies a tlb flush, we want to avoid it where
      possible.
      
      pud_clear() is only used in a couple of places:
       - in free_pmd_range(), when pulling down a range of process address space, and
       - huge_pmd_unshare()
      
      In both cases, the calling code will do a a tlb flush anyway, so
      there's no need to do it within pud_clear().
      
      In free_pmd_range(), the pud_clear is immediately followed by
      pmd_free_tlb(); we can hook that to make the mmu_gather do an
      unconditional full flush to make sure cr3 gets reloaded.
      
      In huge_pmd_unshare, it is followed by flush_tlb_range, which always
      results in a full cr3-reload tlb flush.
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: William Irwin <wli@holomorphy.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      fa28ba21
    • B
      x86: early boot debugging via FireWire (ohci1394_dma=early) · f212ec4b
      Bernhard Kaindl 提交于
      This patch adds a new configuration option, which adds support for a new
      early_param which gets checked in arch/x86/kernel/setup_{32,64}.c:setup_arch()
      to decide wether OHCI-1394 FireWire controllers should be initialized and
      enabled for physical DMA access to allow remote debugging of early problems
      like issues ACPI or other subsystems which are executed very early.
      
      If the config option is not enabled, no code is changed, and if the boot
      paramenter is not given, no new code is executed, and independent of that,
      all new code is freed after boot, so the config option can be even enabled
      in standard, non-debug kernels.
      
      With specialized tools, it is then possible to get debugging information
      from machines which have no serial ports (notebooks) such as the printk
      buffer contents, or any data which can be referenced from global pointers,
      if it is stored below the 4GB limit and even memory dumps of of the physical
      RAM region below the 4GB limit can be taken without any cooperation from the
      CPU of the host, so the machine can be crashed early, it does not matter.
      
      In the extreme, even kernel debuggers can be accessed in this way. I wrote
      a small kgdb module and an accompanying gdb stub for FireWire which allows
      to gdb to talk to kgdb using remote remory reads and writes over FireWire.
      
      An version of the gdb stub fore FireWire is able to read all global data
      from a system which is running a a normal kernel without any kernel debugger,
      without any interruption or support of the system's CPU. That way, e.g. the
      task struct and so on can be read and even manipulated when the physical DMA
      access is granted.
      
      A HOWTO is included in this patch, in Documentation/debugging-via-ohci1394.txt
      and I've put a copy online at
      ftp://ftp.suse.de/private/bk/firewire/docs/debugging-via-ohci1394.txt
      
      It also has links to all the tools which are available to make use of it
      another copy of it is online at:
      ftp://ftp.suse.de/private/bk/firewire/kernel/ohci1394_dma_early-v2.diffSigned-Off-By: NBernhard Kaindl <bk@suse.de>
      Tested-By: NThomas Renninger <trenn@suse.de>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      f212ec4b
    • J
      x86: don't special-case pmd allocations as much · 6194ba6f
      Jeremy Fitzhardinge 提交于
      In x86 PAE mode, stop treating pmds as a special case.  Previously
      they were always allocated and freed with the pgd.  The modifies the
      code to be the same as 64-bit mode, where they are allocated on
      demand.
      
      This is a step on the way to unifying 32/64-bit pagetable allocation
      as much as possible.
      
      There is a complicating wart, however.  When you install a new
      reference to a pmd in the pgd, the processor isn't guaranteed to see
      it unless you reload cr3.  Since reloading cr3 also has the
      side-effect of flushing the tlb, this is an expense that we want to
      avoid whereever possible.
      
      This patch simply avoids reloading cr3 unless the update is to the
      current pagetable.  Later patches will optimise this further.
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: William Irwin <wli@holomorphy.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      6194ba6f
    • H
      x86: shrink some ifdefs in fault.c · fd40d6e3
      Harvey Harrison 提交于
      The change from current to tsk in do_page_fault is safe as
      this is set at the very beginning of the function.
      
      Removes a likely() annotation from the 64-bit version, this
      could have instead been added to 32-bit.
      Signed-off-by: NHarvey Harrison <harvey.harrison@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      fd40d6e3
    • J
      x86: ignore spurious faults · 5b727a3b
      Jeremy Fitzhardinge 提交于
      When changing a kernel page from RO->RW, it's OK to leave stale TLB
      entries around, since doing a global flush is expensive and they pose
      no security problem.  They can, however, generate a spurious fault,
      which we should catch and simply return from (which will have the
      side-effect of reloading the TLB to the current PTE).
      
      This can occur when running under Xen, because it frequently changes
      kernel pages from RW->RO->RW to implement Xen's pagetable semantics.
      It could also occur when using CONFIG_DEBUG_PAGEALLOC, since it avoids
      doing a global TLB flush after changing page permissions.
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Cc: Harvey Harrison <harvey.harrison@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      5b727a3b
    • H
      x86: remove nx_enabled from fault.c · b406ac61
      Harvey Harrison 提交于
      On !PAE 32-bit, _PAGE_NX will be 0, making is_prefetch always
      return early.  The test is sufficient on PAE as __supported_pte_mask
      is updated in the same places as nx_enabled in init_32.c which also
      takes disable_nx into account.
      Signed-off-by: NHarvey Harrison <harvey.harrison@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      b406ac61
    • H
      x86: unify fault_32|64.c · c61e211d
      Harvey Harrison 提交于
      Unify includes in moved fault.c.
      
      Modify Makefiles to pick up unified file.
      Signed-off-by: NHarvey Harrison <harvey.harrison@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      c61e211d
    • H
      x86: unify fault_32|64.c with ifdefs · f8c2ee22
      Harvey Harrison 提交于
      Elimination of these ifdefs can be done in a unified file.
      Signed-off-by: NHarvey Harrison <harvey.harrison@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      f8c2ee22
    • H
      x86: unify fault_32|64.c by ifdef'd function bodies · 1156e098
      Harvey Harrison 提交于
      It's about time to get on with unifying these files, elimination
      of the ugly ifdefs can occur in the unified file.
      Signed-off-by: NHarvey Harrison <harvey.harrison@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      1156e098
    • I
      x86: arch/x86/mm/init_32.c printk fixes · d7d119d7
      Ingo Molnar 提交于
      printk fixes. NOP in terms of functionality, but strings got
      a bit larger due to the KERN_ markers that were added.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      d7d119d7