get_html_text.md 1.8 KB
Newer Older
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99
# lxml解析网页

使用xpath获取所有的文本

```python
# -*- coding: UTF-8 -*-
from lxml import etree

def fetch_text(html):
    # TODO(You): 请在此实现代码
    return result

if __name__ == '__main__':
    html = '''
        <html>
            <head>
                <title>这是一个简单的测试页面</title>
            </head>
            <body>
                <p>body 元素的内容会显示在浏览器中。</p>
                <p>title 元素的内容会显示在浏览器的标题栏中。</p>
            </body>
        </html>
        '''
    imgs = fetch_text(html)
    print(imgs)
```

请选出下列能**正确**实现这一功能的选项。

## template

```python

from lxml import etree


def fetch_text(html):
    html = etree.HTML(html)
    result = html.xpath("//text()")
    return result


def test():
    html = '''
        <html>
            <head>
                <title>这是一个简单的测试页面</title>
            </head>
            <body>
                <p>body 元素的内容会显示在浏览器中。</p>
                <p>title 元素的内容会显示在浏览器的标题栏中。</p>
            </body>
        </html>
        '''
    imgs = fetch_text(html)
    print(imgs)

if __name__ == '__main__':
    test()
```

## 答案

```python
def fetch_text(html):
    html = etree.HTML(html)
    result = html.xpath("//text()")
    return result
```

## 选项

### A

```python
def fetch_text(html):
    html = etree.HTML(html)
    result = html.xpath("/text()")
    return result
```

### B

```python
def fetch_text(html):
    html = etree.HTML(html)
    result = html.xpath("//text")
    return result
```

### C

```python
def fetch_text(html):
    html = etree.HTML(html)
    result = html.xpath("/text()")
    return result
```