XML (eXtensible Markup Language) is a markup language designed to store and transport data. Python offers a variety of libraries to parse and manipulate XML data.
ElementTree is part of Python's Standard Library and is one of the most efficient libraries for parsing XML. Here's a simple example to read an XML file:
import xml.etree.ElementTree as ET
tree = ET.parse('example.xml')
root = tree.getroot()
You can loop through the XML elements like this:
for child in root:
print(child.tag, child.attrib)
Another built-in library is minidom, which allows for DOM-style parsing of XML files:
from xml.dom import minidom
doc = minidom.parse("example.xml")
The lxml library is not built into Python but offers more features and better performance. To parse XML with lxml, you can do:
from lxml import etree
root = etree.parse('example.xml')
While primarily used for web scraping, Beautiful Soup can also parse XML documents:
from bs4 import BeautifulSoup
with open("example.xml", "r") as f:
content = f.read()
soup = BeautifulSoup(content, 'lxml-xml')
Once you've parsed the XML data, you may want to convert it to other formats like JSON, CSV, or a Python dictionary.
Python's `json` library can be used to convert a Python dictionary to JSON.
import json
json_str = json.dumps(parsed_dict)
Use Python�s `csv` library to write the parsed XML data to a CSV file.
import csv
with open('output.csv', 'w', newline='') as csvfile:
writer = csv.writer(csvfile)
writer.writerow(['Column1', 'Column2']) # headers
writer.writerow([data1, data2]) # data
Parsing and converting XML in Python is straightforward thanks to the variety of libraries available. From built-in options like ElementTree and minidom to third-party libraries like lxml and Beautiful Soup, Python provides several ways to handle XML data efficiently. After parsing, the data can be easily converted to other formats such as JSON or CSV.