XPath uses path expressions to select nodes or node-sets in an XML
document. The node is selected by following a path or steps.

The XML Example Document

We will use the following XML document in the examples below.

Harry Potter 29.99 Learning XML 39.95

Selecting Nodes

XPath uses path expressions to select nodes in an XML document. The node
is selected by following a path or steps. The most useful path
expressions are listed below:

Expression Description

nodename Selects all nodes with the name “nodename“

/ Selects from the root node

// Selects nodes in the document from the current node
that match the selection no matter where they are

. Selects the current
node（可防止嵌套XPath时拿到上一级）

.. Selects the parent of the current node

@ Selects attributes

In the table below we have listed some path expressions and the result
of the expressions:

Predicates

Predicates are used to find a specific node or a node that contains a
specific value.

Predicates are always embedded in square brackets.

In the table below we have listed some path expressions with predicates
and the result of the expressions:

Selecting Unknown Nodes

XPath wildcards can be used to select unknown XML nodes.

Wildcard Description

```
           Matches any element node
```

@* Matches any attribute node

node() Matches any node of any kind

In the table below we have listed some path expressions and the result
of the expressions:

Path Result
Expression

/bookstore/* Selects all the child element nodes of the bookstore
element

//* Selects all elements in the document

//title[@*] Selects all title elements which have at least one
attribute of any kind

Selecting Several Paths

By using the | operator in an XPath expression you can select several
paths.

In the table below we have listed some path expressions and the result
of the expressions:

Path Expression Result

//book/title | //book/price Selects all the title AND price elements
of all book elements

//title | //price Selects all the title AND price elements
in the document

/bookstore/book/title | Selects all the title elements of the book
//price element of the bookstore element AND all
the price elements in the document

normalize-space(./span[@class=”app-history-action”])

中的normalize-space 可以等同于innertext()

Xpath判断某个属性是否包含或不包含指定的属性或值

结合Xpath路径来提取循环列表中的一个HTML标签的InnerText，提取的时候需要判断是这个标签的class属性是否包含某个指定的属性值，利用Xpath的contains可以解决，代码如下：

//选择不包含class属性的节点

var result = node.SelectNodes(“.//span[not(@class)]”);

//选择不包含class和id属性的节点

var result = node.SelectNodes(“.//span[not(@class) and
not(@id)]”);

//选择不包含class=”expire”的span

var result =
node.SelectNodes(“.//span[not(contains(@class,’expire’))]”);

//选择包含class=”expire”的span

var result =
node.SelectNodes(“.//span[contains(@class,’expire’)]”);

Python

本博客所有文章除特别声明外，均采用 CC BY-SA 4.0 协议，转载请注明出处！

Python文本处理NLP：分词与词云图上一篇

fake_useragent的实现原理下一篇