lxml.cssselect module

CSS Selectors based on XPath.

This module supports selecting XML/HTML tags based on CSS selectors. See the CSSSelector class for details.

This is a thin wrapper around cssselect 0.7 or later.

class lxml.cssselect.CSSSelector(css, namespaces=None, translator='xml')[source]

Bases: XPath

A CSS selector.

Usage:

>>> from lxml import etree, cssselect
>>> select = cssselect.CSSSelector("a tag > child")

>>> root = etree.XML("<a><b><c/><tag><child>TEXT</child></tag></b></a>")
>>> [ el.tag for el in select(root) ]
['child']

To use CSS namespaces, you need to pass a prefix-to-namespace mapping as namespaces keyword argument:

>>> rdfns = 'http://www.w3.org/1999/02/22-rdf-syntax-ns#'
>>> select_ns = cssselect.CSSSelector('root > rdf|Description',
...                                   namespaces={'rdf': rdfns})

>>> rdf = etree.XML((
...     '<root xmlns:rdf="%s">'
...       '<rdf:Description>blah</rdf:Description>'
...     '</root>') % rdfns)
>>> [(el.tag, el.text) for el in select_ns(rdf)]
[('{http://www.w3.org/1999/02/22-rdf-syntax-ns#}Description', 'blah')]
evaluate(self, _eval_arg, **_variables)

Evaluate an XPath expression.

Instead of calling this method, you can also call the evaluator object itself.

Variables may be provided as keyword arguments. Note that namespaces are currently not supported for variables.

Deprecated:

call the object, not its method.

error_log
path

The literal XPath expression.

class lxml.cssselect.LxmlHTMLTranslator(xhtml=False)[source]

Bases: LxmlTranslator, HTMLTranslator

lxml extensions + HTML support.

xpathexpr_cls

alias of XPathExpr

css_to_xpath(css, prefix='descendant-or-self::')

Translate a group of selectors to XPath.

Pseudo-elements are not supported here since XPath only knows about “real” elements.

Parameters:
  • css – A group of selectors as an Unicode string.

  • prefix – This string is prepended to the XPath expression for each selector. The default makes selectors scoped to the context node’s subtree.

Raises:

SelectorSyntaxError on invalid selectors, ExpressionError on unknown/unsupported selectors, including pseudo-elements.

Returns:

The equivalent XPath 1.0 expression as an Unicode string.

pseudo_never_matches(xpath)

Common implementation for pseudo-classes that never match.

selector_to_xpath(selector, prefix='descendant-or-self::', translate_pseudo_elements=False)

Translate a parsed selector to XPath.

Parameters:
  • selector – A parsed Selector object.

  • prefix – This string is prepended to the resulting XPath expression. The default makes selectors scoped to the context node’s subtree.

  • translate_pseudo_elements – Unless this is set to True (as css_to_xpath() does), the pseudo_element attribute of the selector is ignored. It is the caller’s responsibility to reject selectors with pseudo-elements, or to account for them somehow.

Raises:

ExpressionError on unknown/unsupported selectors.

Returns:

The equivalent XPath 1.0 expression as an Unicode string.

xpath(parsed_selector)

Translate any parsed selector object.

xpath_active_pseudo(xpath)

Common implementation for pseudo-classes that never match.

xpath_attrib(selector)

Translate an attribute selector.

xpath_attrib_dashmatch(xpath, name, value)
xpath_attrib_different(xpath, name, value)
xpath_attrib_equals(xpath, name, value)
xpath_attrib_exists(xpath, name, value)
xpath_attrib_includes(xpath, name, value)
xpath_attrib_prefixmatch(xpath, name, value)
xpath_attrib_substringmatch(xpath, name, value)
xpath_attrib_suffixmatch(xpath, name, value)
xpath_checked_pseudo(xpath)

Common implementation for pseudo-classes that never match.

xpath_child_combinator(left, right)

right is an immediate child of left

xpath_class(class_selector)

Translate a class selector.

xpath_combinedselector(combined)

Translate a combined selector.

xpath_contains_function(xpath, function)
xpath_descendant_combinator(left, right)

right is a child, grand-child or further descendant of left

xpath_direct_adjacent_combinator(left, right)

right is a sibling immediately after left

xpath_disabled_pseudo(xpath)

Common implementation for pseudo-classes that never match.

xpath_element(selector)

Translate a type or universal selector.

xpath_empty_pseudo(xpath)
xpath_enabled_pseudo(xpath)

Common implementation for pseudo-classes that never match.

xpath_first_child_pseudo(xpath)
xpath_first_of_type_pseudo(xpath)
xpath_focus_pseudo(xpath)

Common implementation for pseudo-classes that never match.

xpath_function(function)

Translate a functional pseudo-class.

xpath_hash(id_selector)

Translate an ID selector.

xpath_hover_pseudo(xpath)

Common implementation for pseudo-classes that never match.

xpath_indirect_adjacent_combinator(left, right)

right is a sibling after left, immediately or not

xpath_lang_function(xpath, function)
xpath_last_child_pseudo(xpath)
xpath_last_of_type_pseudo(xpath)

Common implementation for pseudo-classes that never match.

static xpath_literal(s)
xpath_negation(negation)
xpath_nth_child_function(xpath, function, last=False, add_name_test=True)
xpath_nth_last_child_function(xpath, function)
xpath_nth_last_of_type_function(xpath, function)
xpath_nth_of_type_function(xpath, function)
xpath_only_child_pseudo(xpath)
xpath_only_of_type_pseudo(xpath)
xpath_pseudo(pseudo)

Translate a pseudo-class.

xpath_pseudo_element(xpath, pseudo_element)

Translate a pseudo-element.

Defaults to not supporting pseudo-elements at all, but can be overridden by sub-classes.

xpath_root_pseudo(xpath)
xpath_scope_pseudo(xpath)
xpath_target_pseudo(xpath)

Common implementation for pseudo-classes that never match.

xpath_visited_pseudo(xpath)

Common implementation for pseudo-classes that never match.

attribute_operator_mapping = {'!=': 'different', '$=': 'suffixmatch', '*=': 'substringmatch', '=': 'equals', '^=': 'prefixmatch', 'exists': 'exists', '|=': 'dashmatch', '~=': 'includes'}
combinator_mapping = {' ': 'descendant', '+': 'direct_adjacent', '>': 'child', '~': 'indirect_adjacent'}
id_attribute = 'id'

The attribute used for ID selectors depends on the document language: http://www.w3.org/TR/selectors/#id-selectors

lang_attribute = 'lang'

The attribute used for :lang() depends on the document language: http://www.w3.org/TR/selectors/#lang-pseudo

lower_case_attribute_names = False
lower_case_attribute_values = False
lower_case_element_names = False

The case sensitivity of document language element names, attribute names, and attribute values in selectors depends on the document language. http://www.w3.org/TR/selectors/#casesens

When a document language defines one of these as case-insensitive, cssselect assumes that the document parser makes the parsed values lower-case. Making the selector lower-case too makes the comparaison case-insensitive.

In HTML, element names and attributes names (but not attribute values) are case-insensitive. All of lxml.html, html5lib, BeautifulSoup4 and HTMLParser make them lower-case in their parse result, so the assumption holds.

class lxml.cssselect.LxmlTranslator[source]

Bases: GenericTranslator

A custom CSS selector to XPath translator with lxml-specific extensions.

xpathexpr_cls

alias of XPathExpr

css_to_xpath(css, prefix='descendant-or-self::')

Translate a group of selectors to XPath.

Pseudo-elements are not supported here since XPath only knows about “real” elements.

Parameters:
  • css – A group of selectors as an Unicode string.

  • prefix – This string is prepended to the XPath expression for each selector. The default makes selectors scoped to the context node’s subtree.

Raises:

SelectorSyntaxError on invalid selectors, ExpressionError on unknown/unsupported selectors, including pseudo-elements.

Returns:

The equivalent XPath 1.0 expression as an Unicode string.

pseudo_never_matches(xpath)

Common implementation for pseudo-classes that never match.

selector_to_xpath(selector, prefix='descendant-or-self::', translate_pseudo_elements=False)

Translate a parsed selector to XPath.

Parameters:
  • selector – A parsed Selector object.

  • prefix – This string is prepended to the resulting XPath expression. The default makes selectors scoped to the context node’s subtree.

  • translate_pseudo_elements – Unless this is set to True (as css_to_xpath() does), the pseudo_element attribute of the selector is ignored. It is the caller’s responsibility to reject selectors with pseudo-elements, or to account for them somehow.

Raises:

ExpressionError on unknown/unsupported selectors.

Returns:

The equivalent XPath 1.0 expression as an Unicode string.

xpath(parsed_selector)

Translate any parsed selector object.

xpath_active_pseudo(xpath)

Common implementation for pseudo-classes that never match.

xpath_attrib(selector)

Translate an attribute selector.

xpath_attrib_dashmatch(xpath, name, value)
xpath_attrib_different(xpath, name, value)
xpath_attrib_equals(xpath, name, value)
xpath_attrib_exists(xpath, name, value)
xpath_attrib_includes(xpath, name, value)
xpath_attrib_prefixmatch(xpath, name, value)
xpath_attrib_substringmatch(xpath, name, value)
xpath_attrib_suffixmatch(xpath, name, value)
xpath_checked_pseudo(xpath)

Common implementation for pseudo-classes that never match.

xpath_child_combinator(left, right)

right is an immediate child of left

xpath_class(class_selector)

Translate a class selector.

xpath_combinedselector(combined)

Translate a combined selector.

xpath_contains_function(xpath, function)[source]
xpath_descendant_combinator(left, right)

right is a child, grand-child or further descendant of left

xpath_direct_adjacent_combinator(left, right)

right is a sibling immediately after left

xpath_disabled_pseudo(xpath)

Common implementation for pseudo-classes that never match.

xpath_element(selector)

Translate a type or universal selector.

xpath_empty_pseudo(xpath)
xpath_enabled_pseudo(xpath)

Common implementation for pseudo-classes that never match.

xpath_first_child_pseudo(xpath)
xpath_first_of_type_pseudo(xpath)
xpath_focus_pseudo(xpath)

Common implementation for pseudo-classes that never match.

xpath_function(function)

Translate a functional pseudo-class.

xpath_hash(id_selector)

Translate an ID selector.

xpath_hover_pseudo(xpath)

Common implementation for pseudo-classes that never match.

xpath_indirect_adjacent_combinator(left, right)

right is a sibling after left, immediately or not

xpath_lang_function(xpath, function)
xpath_last_child_pseudo(xpath)
xpath_last_of_type_pseudo(xpath)

Common implementation for pseudo-classes that never match.

static xpath_literal(s)
xpath_negation(negation)
xpath_nth_child_function(xpath, function, last=False, add_name_test=True)
xpath_nth_last_child_function(xpath, function)
xpath_nth_last_of_type_function(xpath, function)
xpath_nth_of_type_function(xpath, function)
xpath_only_child_pseudo(xpath)
xpath_only_of_type_pseudo(xpath)
xpath_pseudo(pseudo)

Translate a pseudo-class.

xpath_pseudo_element(xpath, pseudo_element)

Translate a pseudo-element.

Defaults to not supporting pseudo-elements at all, but can be overridden by sub-classes.

xpath_root_pseudo(xpath)
xpath_scope_pseudo(xpath)
xpath_target_pseudo(xpath)

Common implementation for pseudo-classes that never match.

xpath_visited_pseudo(xpath)

Common implementation for pseudo-classes that never match.

attribute_operator_mapping = {'!=': 'different', '$=': 'suffixmatch', '*=': 'substringmatch', '=': 'equals', '^=': 'prefixmatch', 'exists': 'exists', '|=': 'dashmatch', '~=': 'includes'}
combinator_mapping = {' ': 'descendant', '+': 'direct_adjacent', '>': 'child', '~': 'indirect_adjacent'}
id_attribute = 'id'

The attribute used for ID selectors depends on the document language: http://www.w3.org/TR/selectors/#id-selectors

lang_attribute = 'xml:lang'

The attribute used for :lang() depends on the document language: http://www.w3.org/TR/selectors/#lang-pseudo

lower_case_attribute_names = False
lower_case_attribute_values = False
lower_case_element_names = False

The case sensitivity of document language element names, attribute names, and attribute values in selectors depends on the document language. http://www.w3.org/TR/selectors/#casesens

When a document language defines one of these as case-insensitive, cssselect assumes that the document parser makes the parsed values lower-case. Making the selector lower-case too makes the comparaison case-insensitive.

In HTML, element names and attributes names (but not attribute values) are case-insensitive. All of lxml.html, html5lib, BeautifulSoup4 and HTMLParser make them lower-case in their parse result, so the assumption holds.

lxml.cssselect._make_lower_case(context, s)[source]