3

I am using lxml to parse XML from an external service that has namespaces, but doesn't register them with xmlns. I am trying to register it by hand with register_namespace, but that doesn't seem to work.

from lxml import etree

xml = """
    <Foo xsi:type="xsd:string">bar</Foo>
"""

etree.register_namespace('xsi', 'http://www.w3.org/2001/XMLSchema-instance')
el = etree.fromstring(xml) # lxml.etree.XMLSyntaxError: Namespace prefix xsi for type on Foo is not defined

What am I missing? Oddly enough, looking at the lxml source code to try and understand what I might be doing wrong, it seems as if the xsi namespace should already be there as one of the default namespaces.

Alex Turpin
  • 46,743
  • 23
  • 113
  • 145

2 Answers2

8

When an XML document is parsed and then saved again, lxml does not change any prefixes (and register_namespace has no effect).

If your XML document does not declare its namespace prefixes, it is not namespace-well-formed. Using register_namespace before parsing cannot fix this.


register_namespace defines the prefixes to be used when serializing a newly created XML document.

Example 1 (without register_namespace):

from lxml import etree

el = etree.Element('{http://example.com}Foo')
print(etree.tostring(el).decode())

Output:

<ns0:Foo xmlns:ns0="http://example.com"/>

Example 2 (with register_namespace):

from lxml import etree

etree.register_namespace("abc", "http://example.com")

el = etree.Element('{http://example.com}Foo')
print(etree.tostring(el).decode())

Output:

<abc:Foo xmlns:abc="http://example.com"/>

Example 3 (without register_namespace, but with a "well-known" namespace associated with a conventional prefix):

from lxml import etree

el = etree.Element('{http://www.w3.org/2001/XMLSchema-instance}Foo')
print(etree.tostring(el).decode())

Output:

<xsi:Foo xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"/>
mzjn
  • 48,958
  • 13
  • 128
  • 248
4

Namespace-well-formed XML that uses custom namespaces must also include the namespace declaration itself. Adding an xmlns in the first element is enough:

from lxml import etree

xml = """
    <Foo xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance' xsi:type='xsd:string'>bar</Foo>
"""
el = etree.fromstring(xml)    
print (el)

So, technically, if your XML uses xsi but it does not contain the namespace declaration, it's not (namespace) well-formed XML.

See also How to restrict the value of an XML element using xsi:type in XSD?

kjhughes
  • 106,133
  • 27
  • 181
  • 240
Jongware
  • 22,200
  • 8
  • 54
  • 100
  • I see. What is the purpose of the `register_namespace` function in this case? – Alex Turpin Jan 22 '20 at 05:01
  • @AlexTurpin To be more clear: any string you parse with `etree.fromstring` must be a self-contained, well-formed XML document. Namespaces must be declared there, and this is unrelated to lxml code. `register_namespace` lets you add to the namespace URIs and corresponding prefixes that lxml "knows" but yes, `xsi:` might be pre-declared somewhere in the source code. – Mathias Müller Jan 22 '20 at 08:05