1. 29 9月, 2011 9 次提交
    • D
      e1000: don't enable dma receives until after dma address has been setup · d5bc77a2
      Dean Nelson 提交于
      Doing an 'ifconfig ethN down' followed by an 'ifconfig ethN up' on a qemu-kvm
      guest system configured with two e1000 NICs can result in an 'unable to handle
      kernel paging request at 0000000100000000' or 'bad page map in process ...' or
      something similar.
      
      These result from a 4096-byte page being corrupted with the following two-word
      pattern (16-bytes) repeated throughout the entire page:
      
        0x0000000000000000
        0x0000000100000000
      
      There can be other bits set as well. What is a constant is that the 2nd word
      has the 32nd bit set. So one could see:
      
              :
        0x0000000000000000
        0x0000000100000000
        0x0000000000000000
        0x0000000172adc067    <<< bad pte
        0x800000006ec60067
        0x0000000700000040
        0x0000000000000000
        0x0000000100000000
              :
      
      Which came from from a process' page table I dumped out when the marked line
      was seen as bad by print_bad_pte().
      
      The repeating pattern represents the e1000's two-word receive descriptor:
      
      struct e1000_rx_desc {
              __le64 buffer_addr;   /* Address of the descriptor's data buffer */
              __le16 length;        /* Length of data DMAed into data buffer */
              __le16 csum;          /* Packet checksum */
              u8 status;            /* Descriptor status */
              u8 errors;            /* Descriptor Errors */
              __le16 special;
      };
      
      And the 32nd bit of the 2nd word maps to the 'u8 status' member, and
      corresponds to E1000_RXD_STAT_DD which indicates the descriptor is done.
      
      The corruption appears to result from the following...
      
       . An 'ifconfig ethN down' gets us into e1000_close(), which through a number
         of subfunctions results in:
           1. E1000_RCTL_EN being cleared in RCTL register.  [e1000_down()]
           2. dma_free_coherent() being called.  [e1000_free_rx_resources()]
      
       . An 'ifconfig ethN up' gets us into e1000_open(), which through a number of
         subfunctions results in:
           1. dma_alloc_coherent() being called.  [e1000_setup_rx_resources()]
           2. E1000_RCTL_EN being set in RCTL register.  [e1000_setup_rctl()]
           3. E1000_RCTL_EN being cleared in RCTL register.  [e1000_configure_rx()]
           4. RDLEN, RDBAH and RDBAL registers being set to reflect the dma page
              allocated in step 1.  [e1000_configure_rx()]
           5. E1000_RCTL_EN being set in RCTL register.  [e1000_configure_rx()]
      
      During the 'ifconfig ethN up' there is a window opened, starting in step 2
      where the receives are enabled up until they are disabled in step 3, in which
      the address of the receive descriptor dma page known by the NIC is still the
      previous one which was freed during the 'ifconfig ethN down'. If this memory
      has been reallocated for some other use and the NIC feels so inclined, it will
      write to that former dma page with predictably unpleasant results.
      
      I realize that in the guest, we're dealing with an e1000 NIC that is software
      emulated by qemu-kvm. The problem doesn't appear to occur on bare-metal. Andy
      suspects that this is because in the emulator link-up is essentially instant
      and traffic can start flowing immediately. Whereas on bare-metal, link-up
      usually seems to take at least a few milliseconds. And this might be enough
      to prevent traffic from flowing into the device inside the window where
      E1000_RCTL_EN is set.
      
      So perhaps a modification needs to be made to the qemu-kvm e1000 NIC emulator
      to delay the link-up. But in defense of the emulator, it seems like a bad idea
      to enable dma operations before the address of the memory to be involved has
      been made known.
      
      The following patch no longer enables receives in e1000_setup_rctl() but leaves
      them however they were. It only enables receives in e1000_configure_rx(), and
      only after the dma address has been made known to the hardware.
      
      There are two places where e1000_setup_rctl() gets called. The one in
      e1000_configure() is followed immediately by a call to e1000_configure_rx(), so
      there's really no change functionally (except for the removal of the problem
      window. The other is in __e1000_shutdown() and is not followed by a call to
      e1000_configure_rx(), so there is a change functionally. But consider...
      
       . An 'ifconfig ethN down' (just as described above).
      
       . A 'suspend' of the system, which (I'm assuming) will find its way into
         e1000_suspend() which calls __e1000_shutdown() resulting in:
           1. E1000_RCTL_EN being set in RCTL register.  [e1000_setup_rctl()]
      
      And again we've re-opened the problem window for some unknown amount of time.
      Signed-off-by: NAndy Gospodarek <andy@greyhouse.net>
      Signed-off-by: NDean Nelson <dnelson@redhat.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      d5bc77a2
    • O
      candev: allow SJW user setting for bittiming calculation · 2e114374
      Oliver Hartkopp 提交于
      This patch adds support for SJW user settings to not set the synchronization
      jump width (SJW) to 1 in any case when using the in-kernel bittiming
      calculation.
      
      The ip-tool from iproute2 already supports to pass the user defined SJW
      value. The given SJW value is sanitized with the controller specific sjw_max
      and the calculated tseg2 value. As the SJW can have values up to 4 providing
      this value will lead to the maximum possible SJW automatically. A higher SJW
      allows higher controller oscillator tolerances.
      Signed-off-by: NOliver Hartkopp <socketcan@hartkopp.net>
      Acked-by: NWolfgang Grandegger <wg@grandegger.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2e114374
    • Y
    • Y
      8eac3f60
    • Y
      net: sh_eth: use ioremap() · ae70644d
      Yoshihiro Shimoda 提交于
      This patch also changes writel/readl to iowrite32/ioread32.
      Signed-off-by: NYoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ae70644d
    • O
      can/sja1000: add driver for EMS PCMCIA card · fd734c6f
      Oliver Hartkopp 提交于
      This patch adds the driver for the SJA1000 based PCMCIA card 'CPC-Card' from
      EMS Dr. Thomas Wuensche (http://www.ems-wuensche.de).
      Signed-off-by: NOliver Hartkopp <socketcan@hartkopp.net>
      Acked-by: NMarkus Plessing <plessing@ems-wuensche.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fd734c6f
    • V
      connector: add comm change event report to proc connector · f786ecba
      Vladimir Zapolskiy 提交于
      Add an event to monitor comm value changes of tasks.  Such an event
      becomes vital, if someone desires to control threads of a process in
      different manner.
      
      A natural characteristic of threads is its comm value, and helpfully
      application developers have an opportunity to change it in runtime.
      Reporting about such events via proc connector allows to fine-grain
      monitoring and control potentials, for instance a process control daemon
      listening to proc connector and following comm value policies can place
      specific threads to assigned cgroup partitions.
      
      It might be possible to achieve a pale partial one-shot likeness without
      this update, if an application changes comm value of a thread generator
      task beforehand, then a new thread is cloned, and after that proc
      connector listener gets the fork event and reads new thread's comm value
      from procfs stat file, but this change visibly simplifies and extends the
      matter.
      Signed-off-by: NVladimir Zapolskiy <vzapolskiy@gmail.com>
      Acked-by: NEvgeniy Polyakov <zbr@ioremap.net>
      Cc: David Miller <davem@davemloft.net>
      Signed-off-by: NAndrew Morton <akpm@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f786ecba
    • C
      net: rps: fix the support for PPPOE · 5dd17e08
      Changli Gao 提交于
      The upper protocol numbers of PPPOE are different, and should be treated
      specially.
      Signed-off-by: NChangli Gao <xiaosuo@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5dd17e08
    • E
      af_unix: dont send SCM_CREDENTIALS by default · 16e57262
      Eric Dumazet 提交于
      Since commit 7361c36c (af_unix: Allow credentials to work across
      user and pid namespaces) af_unix performance dropped a lot.
      
      This is because we now take a reference on pid and cred in each write(),
      and release them in read(), usually done from another process,
      eventually from another cpu. This triggers false sharing.
      
      # Events: 154K cycles
      #
      # Overhead  Command       Shared Object        Symbol
      # ........  .......  ..................  .........................
      #
          10.40%  hackbench  [kernel.kallsyms]   [k] put_pid
           8.60%  hackbench  [kernel.kallsyms]   [k] unix_stream_recvmsg
           7.87%  hackbench  [kernel.kallsyms]   [k] unix_stream_sendmsg
           6.11%  hackbench  [kernel.kallsyms]   [k] do_raw_spin_lock
           4.95%  hackbench  [kernel.kallsyms]   [k] unix_scm_to_skb
           4.87%  hackbench  [kernel.kallsyms]   [k] pid_nr_ns
           4.34%  hackbench  [kernel.kallsyms]   [k] cred_to_ucred
           2.39%  hackbench  [kernel.kallsyms]   [k] unix_destruct_scm
           2.24%  hackbench  [kernel.kallsyms]   [k] sub_preempt_count
           1.75%  hackbench  [kernel.kallsyms]   [k] fget_light
           1.51%  hackbench  [kernel.kallsyms]   [k]
      __mutex_lock_interruptible_slowpath
           1.42%  hackbench  [kernel.kallsyms]   [k] sock_alloc_send_pskb
      
      This patch includes SCM_CREDENTIALS information in a af_unix message/skb
      only if requested by the sender, [man 7 unix for details how to include
      ancillary data using sendmsg() system call]
      
      Note: This might break buggy applications that expected SCM_CREDENTIAL
      from an unaware write() system call, and receiver not using SO_PASSCRED
      socket option.
      
      If SOCK_PASSCRED is set on source or destination socket, we still
      include credentials for mere write() syscalls.
      
      Performance boost in hackbench : more than 50% gain on a 16 thread
      machine (2 quad-core cpus, 2 threads per core)
      
      hackbench 20 thread 2000
      
      4.228 sec instead of 9.102 sec
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Acked-by: NTim Chen <tim.c.chen@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      16e57262
  2. 28 9月, 2011 4 次提交
  3. 27 9月, 2011 12 次提交
  4. 24 9月, 2011 15 次提交