DevToolBox무료
블로그

XML 포매터: 온라인으로 XML 포맷 및 검증 — 완전 가이드

13분 읽기by DevToolBox

TL;DR

An XML formatter pretty-prints raw XML with proper indentation and validates its structure. Use DOMParser + XMLSerializer in browsers, xml.etree.ElementTree or lxml in Python, and fast-xml-parser in Node.js. Validate XML against an XSD schema using lxml in Python or JAXB in Java. Query XML with XPath and transform it with XSLT. Try our free online XML formatter for instant formatting and validation, or follow the code examples below.

What Is XML? Structure, Elements, Attributes, and Namespaces

XML (eXtensible Markup Language) is a W3C standard markup language designed for storing and transporting structured data in a human-readable format. Unlike HTML, XML has no predefined tags — you define your own vocabulary to describe your data. XML separates data from presentation, making it ideal for data interchange between systems.

A well-formed XML document has this fundamental structure:

<?xml version="1.0" encoding="UTF-8"?>
<!-- XML declaration specifies version and character encoding -->
<library xmlns="http://example.com/library"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://example.com/library library.xsd">

  <book id="978-0-13-110362-7" category="programming">
    <title lang="en">The C Programming Language</title>
    <authors>
      <author>Brian W. Kernighan</author>
      <author>Dennis M. Ritchie</author>
    </authors>
    <price currency="USD">45.99</price>
    <description><![CDATA[
      Classic reference for the C language.
      Contains <examples> and & special chars.
    ]]></description>
    <tags>
      <tag>c</tag>
      <tag>programming</tag>
      <tag>systems</tag>
    </tags>
  </book>

</library>

Key XML concepts:

  • Elements: The basic building blocks — <element>content</element>. Every element must have a closing tag or be self-closing: <br/>.
  • Attributes: Name-value pairs inside the opening tag. Values must always be quoted: id="123" or id='123'.
  • Root element: Every XML document must have exactly one root element that contains all others.
  • CDATA sections: <![CDATA[ ... ]]> allows embedding raw text with special characters without escaping.
  • Namespaces: xmlns:prefix="URI" declarations prevent naming conflicts when combining XML vocabularies.
  • Processing instructions: <?xml-stylesheet type="text/xsl" href="style.xsl"?> provide metadata to applications.
  • Comments: <!-- comment --> — cannot contain double hyphens inside.

XML vs HTML — Key Differences

FeatureXMLHTML
PurposeStore and transport dataDisplay data in browsers
TagsUser-defined (any name)Predefined (div, p, span...)
Case sensitivityCase-sensitive (Name ≠ name)Case-insensitive
Closing tagsMandatory for all elementsOptional for some (br, img)
Attribute valuesMust always be quotedQuotes optional in HTML5
Error handlingFatal error on any malformed XMLLenient — browsers auto-correct
White spacePreserved (significant)Collapsed by browsers

XML Formatting and Pretty-Printing — Why Indentation Matters

Parsers treat whitespace between elements as either insignificant (in element-only content) or significant (in mixed content). For data-centric XML (like configuration files), whitespace between elements is insignificant and can be freely added for readability.

Minified XML is valid but hard to read. Pretty-printed XML uses consistent indentation (2 or 4 spaces per level). The xml:space="preserve" attribute signals that whitespace in that element should be preserved:

<!-- Minified XML — valid but unreadable -->
<root><person id="1"><name>Alice</name><age>30</age></person></root>

<!-- Pretty-printed XML — same data, readable -->
<root>
  <person id="1">
    <name>Alice</name>
    <age>30</age>
  </person>
</root>

<!-- xml:space="preserve" preserves whitespace in pre-like content -->
<code xml:space="preserve">
  function hello() {
    return "world";
  }
</code>

JavaScript — DOMParser and XMLSerializer (Browser)

The browser provides DOMParser to parse XML strings into DOM documents and XMLSerializer to serialize DOM back to strings. For pretty-printing, you must implement indentation manually since browsers do not format XML by default.

// Parse XML string in browser
function parseXml(xmlString: string): Document | null {
  const parser = new DOMParser();
  const doc = parser.parseFromString(xmlString, 'application/xml');

  // Check for parse errors
  const parseError = doc.querySelector('parsererror');
  if (parseError) {
    console.error('XML parse error:', parseError.textContent);
    return null;
  }
  return doc;
}

// Serialize DOM document back to string
function serializeXml(doc: Document): string {
  const serializer = new XMLSerializer();
  return serializer.serializeToString(doc);
}

// Pretty-print XML with indentation
function prettyPrintXml(xmlString: string, indent = '  '): string {
  const parser = new DOMParser();
  const xmlDoc = parser.parseFromString(xmlString, 'application/xml');

  // Check for parse errors
  const error = xmlDoc.querySelector('parsererror');
  if (error) throw new Error('Invalid XML: ' + error.textContent);

  return formatNode(xmlDoc.documentElement, 0, indent);
}

function formatNode(node: Element, depth: number, indent: string): string {
  const pad = indent.repeat(depth);
  const childPad = indent.repeat(depth + 1);

  // Build opening tag with attributes
  let result = pad + '<' + node.nodeName;
  for (const attr of Array.from(node.attributes)) {
    result += ` ${attr.name}="${attr.value}"`;
  }

  const children = Array.from(node.childNodes).filter(
    n => n.nodeType === Node.ELEMENT_NODE ||
         (n.nodeType === Node.TEXT_NODE && n.textContent?.trim())
  );

  if (children.length === 0) {
    result += '/>';
    return result;
  }

  result += '>';

  const hasElementChildren = children.some(n => n.nodeType === Node.ELEMENT_NODE);

  if (hasElementChildren) {
    result += '\n';
    for (const child of children) {
      if (child.nodeType === Node.ELEMENT_NODE) {
        result += formatNode(child as Element, depth + 1, indent) + '\n';
      } else if (child.nodeType === Node.TEXT_NODE) {
        const text = child.textContent?.trim();
        if (text) result += childPad + text + '\n';
      }
    }
    result += pad + '</' + node.nodeName + '>';
  } else {
    // Inline text content
    const text = node.textContent?.trim() || '';
    result += text + '</' + node.nodeName + '>';
  }

  return result;
}

// Usage
const xml = '<root><person id="1"><name>Alice</name><age>30</age></person></root>';
console.log(prettyPrintXml(xml));
/*
<root>
  <person id="1">
    <name>Alice</name>
    <age>30</age>
  </person>
</root>
*/

// Extract data from XML
function getPersonNames(xmlString: string): string[] {
  const doc = parseXml(xmlString);
  if (!doc) return [];
  return Array.from(doc.querySelectorAll('name')).map(el => el.textContent || '');
}

// Check if XML is well-formed
function isValidXml(xmlString: string): boolean {
  const doc = parseXml(xmlString);
  return doc !== null;
}

Node.js — xml2js and fast-xml-parser

Node.js does not include a native XML parser, so you need npm packages. fast-xml-parser is the fastest option with zero dependencies. xml2js is older but very widely used. Both convert XML to JavaScript objects and back.

# Install
npm install fast-xml-parser
npm install xml2js  # alternative
// fast-xml-parser — recommended for performance
import { XMLParser, XMLBuilder, XMLValidator } from 'fast-xml-parser';

const xmlString = `<?xml version="1.0" encoding="UTF-8"?>
<library>
  <book id="1">
    <title>Clean Code</title>
    <author>Robert C. Martin</author>
    <price>39.99</price>
  </book>
  <book id="2">
    <title>The Pragmatic Programmer</title>
    <author>Andrew Hunt</author>
    <price>44.99</price>
  </book>
</library>`;

// Parse XML to JavaScript object
const parser = new XMLParser({
  ignoreAttributes: false,     // include XML attributes
  attributeNamePrefix: '@_',   // prefix attribute keys with @_
  parseAttributeValue: true,   // parse attribute values as numbers/booleans
});
const result = parser.parse(xmlString);
console.log(result.library.book);
// => [{ '@_id': 1, title: 'Clean Code', author: 'Robert C. Martin', price: 39.99 }, ...]

// Validate XML before parsing
const validationResult = XMLValidator.validate(xmlString);
if (validationResult !== true) {
  console.error('XML validation error:', validationResult.err);
}

// Build XML from JavaScript object
const builder = new XMLBuilder({
  ignoreAttributes: false,
  attributeNamePrefix: '@_',
  format: true,              // pretty-print with indentation
  indentBy: '  ',            // 2-space indent
  suppressEmptyNode: true,   // <empty/> instead of <empty></empty>
});
const jsObject = {
  library: {
    book: [
      { '@_id': 1, title: 'Refactoring', author: 'Martin Fowler', price: 49.99 },
      { '@_id': 2, title: 'SICP', author: 'Harold Abelson', price: 39.99 },
    ]
  }
};
const newXml = builder.build(jsObject);
console.log(newXml);

// xml2js — alternative with promise-based API
import { parseStringPromise, Builder } from 'xml2js';

const parsed = await parseStringPromise(xmlString, {
  explicitArray: false,  // don't wrap single elements in arrays
  mergeAttrs: true,      // merge attributes into the element object
  trim: true,
});
console.log(parsed.library.book);

// Build XML from object with xml2js
const xmlBuilder = new Builder({
  renderOpts: { pretty: true, indent: '  ' },
  xmldec: { version: '1.0', encoding: 'UTF-8' },
});
const xmlOutput = xmlBuilder.buildObject({ library: { book: [/* ... */] } });

Python — xml.etree.ElementTree and minidom

Python's standard library includes xml.etree.ElementTree (fast, minimal) and xml.dom.minidom (DOM-based, better for pretty-printing). Python 3.9+ added the indent() function to ElementTree for easy pretty-printing.

import xml.etree.ElementTree as ET
from xml.dom import minidom

xml_string = """<?xml version="1.0" encoding="UTF-8"?>
<library>
  <book id="1">
    <title>Clean Code</title>
    <author>Robert C. Martin</author>
    <price currency="USD">39.99</price>
  </book>
</library>"""

# Parse XML
tree = ET.parse('library.xml')  # from file
root = ET.fromstring(xml_string)  # from string

# Access elements
for book in root.findall('book'):
    title = book.find('title').text
    author = book.find('author').text
    price = book.find('price').text
    currency = book.find('price').get('currency', 'USD')
    book_id = book.get('id')
    print(f"Book {book_id}: {title} by {author} ({currency} {price})")

# Find all titles using findall with path
titles = [el.text for el in root.findall('./book/title')]

# Iterate all elements recursively
for element in root.iter():
    print(f"Tag: {element.tag}, Text: {element.text}")

# Modify XML
for book in root.findall('book'):
    price_el = book.find('price')
    if price_el is not None:
        old_price = float(price_el.text)
        price_el.text = str(round(old_price * 1.1, 2))  # 10% increase

# Pretty-print with Python 3.9+ indent() function
ET.indent(root, space='  ')
print(ET.tostring(root, encoding='unicode'))

# Pretty-print with minidom (works in older Python)
def pretty_print_xml(xml_string: str) -> str:
    parsed = minidom.parseString(xml_string)
    return parsed.toprettyxml(indent='  ', encoding=None)

pretty = pretty_print_xml(xml_string)
print(pretty)

# Create new XML from scratch
root = ET.Element('library')
root.set('xmlns', 'http://example.com/library')

book = ET.SubElement(root, 'book')
book.set('id', '1')

title = ET.SubElement(book, 'title')
title.text = 'The Pragmatic Programmer'

author = ET.SubElement(book, 'author')
author.text = 'Andrew Hunt'

# Serialize to string
ET.indent(root, space='  ')
xml_output = ET.tostring(root, encoding='unicode', xml_declaration=True)

# Write to file
tree = ET.ElementTree(root)
ET.indent(tree, space='  ')
tree.write('output.xml', encoding='utf-8', xml_declaration=True)

Python — lxml Library (XPath, XSD Validation, XSLT)

lxml is the most powerful Python XML library, built on libxml2 and libxslt. It supports full XPath 1.0, XSD/RELAX NG validation, XSLT transformations, and has much better performance than the standard library. Install with pip install lxml.

from lxml import etree
import requests

xml_string = b"""<?xml version="1.0" encoding="UTF-8"?>
<library xmlns="http://example.com/library">
  <book id="1" price="39.99">
    <title>Clean Code</title>
    <author>Robert C. Martin</author>
  </book>
  <book id="2" price="29.99">
    <title>The Pragmatic Programmer</title>
    <author>Andrew Hunt</author>
  </book>
</library>"""

# Parse XML
root = etree.fromstring(xml_string)
# or from file: root = etree.parse('file.xml').getroot()

# Pretty-print
pretty = etree.tostring(root, pretty_print=True, encoding='unicode')
print(pretty)

# XPath queries — powerful element selection
ns = {'lib': 'http://example.com/library'}

# Get all book titles
titles = root.xpath('//lib:book/lib:title/text()', namespaces=ns)
print(titles)  # => ['Clean Code', 'The Pragmatic Programmer']

# Get books with price under 35
cheap_books = root.xpath('//lib:book[@price < 35]', namespaces=ns)
for book in cheap_books:
    print(book.xpath('lib:title/text()', namespaces=ns))

# Get the first book
first_book = root.xpath('//lib:book[1]', namespaces=ns)

# XSD Schema validation
xsd_string = b"""<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
                   targetNamespace="http://example.com/library"
                   xmlns="http://example.com/library">
  <xs:element name="library">
    <xs:complexType>
      <xs:sequence>
        <xs:element name="book" maxOccurs="unbounded">
          <xs:complexType>
            <xs:sequence>
              <xs:element name="title" type="xs:string"/>
              <xs:element name="author" type="xs:string"/>
            </xs:sequence>
            <xs:attribute name="id" type="xs:integer" use="required"/>
            <xs:attribute name="price" type="xs:decimal" use="required"/>
          </xs:complexType>
        </xs:element>
      </xs:sequence>
    </xs:complexType>
  </xs:element>
</xs:schema>"""

xsd_doc = etree.fromstring(xsd_string)
schema = etree.XMLSchema(xsd_doc)

# Validate XML against schema
if schema.validate(root):
    print("XML is valid!")
else:
    for error in schema.error_log:
        print(f"Validation error: {error.message} (line {error.line})")

# XSLT transformation
xslt_string = b"""<xsl:stylesheet version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:lib="http://example.com/library">
  <xsl:template match="/">
    <html>
      <body>
        <table border="1">
          <tr><th>ID</th><th>Title</th><th>Author</th></tr>
          <xsl:for-each select="//lib:book">
            <tr>
              <td><xsl:value-of select="@id"/></td>
              <td><xsl:value-of select="lib:title"/></td>
              <td><xsl:value-of select="lib:author"/></td>
            </tr>
          </xsl:for-each>
        </table>
      </body>
    </html>
  </xsl:template>
</xsl:stylesheet>"""

xslt_doc = etree.fromstring(xslt_string)
transform = etree.XSLT(xslt_doc)
result_tree = transform(root)
print(str(result_tree))

XML Validation — DTD vs XSD vs RELAX NG

XML validation checks that a document conforms to a defined structure beyond just being well-formed. There are three major schema languages:

  • DTD (Document Type Definition): The original XML schema format. Simple syntax, supports elements and attributes but not data types. Cannot validate that an element contains a number vs string. Used in legacy XML, XHTML, and HTML5 doctype declarations.
  • XSD (XML Schema Definition): The W3C standard, written in XML itself. Supports 44 built-in data types (string, integer, date, boolean, etc.), namespaces, inheritance, regular expression patterns for value constraints. Industry standard for SOAP, enterprise XML.
  • RELAX NG: More expressive than XSD, simpler to write. Available in XML syntax and compact notation. Does not support data type facets like minInclusive directly. Used in document-centric XML, OpenDocument Format, EPUB.
<!-- DTD (Document Type Definition) — inline or external -->
<!DOCTYPE library [
  <!ELEMENT library (book+)>
  <!ELEMENT book (title, author, price)>
  <!ATTLIST book
    id    ID       #REQUIRED
    category CDATA #IMPLIED>
  <!ELEMENT title  (#PCDATA)>
  <!ELEMENT author (#PCDATA)>
  <!ELEMENT price  (#PCDATA)>
]>
<library>
  <book id="b1">
    <title>Clean Code</title>
    <author>Robert C. Martin</author>
    <price>39.99</price>
  </book>
</library>

<!-- XSD Schema — external file (library.xsd) -->
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

  <xs:element name="library">
    <xs:complexType>
      <xs:sequence>
        <xs:element name="book" type="BookType"
                    minOccurs="1" maxOccurs="unbounded"/>
      </xs:sequence>
    </xs:complexType>
  </xs:element>

  <xs:complexType name="BookType">
    <xs:sequence>
      <xs:element name="title" type="xs:string"/>
      <xs:element name="author" type="xs:string" maxOccurs="5"/>
      <xs:element name="price">
        <xs:simpleType>
          <xs:restriction base="xs:decimal">
            <xs:minInclusive value="0"/>
            <xs:maxInclusive value="9999.99"/>
          </xs:restriction>
        </xs:simpleType>
      </xs:element>
      <xs:element name="isbn" minOccurs="0">
        <xs:simpleType>
          <xs:restriction base="xs:string">
            <!-- ISBN-13 pattern: 978-x-xxx-xxxxx-x -->
            <xs:pattern value="\d{3}-\d-\d{3}-\d{5}-\d"/>
          </xs:restriction>
        </xs:simpleType>
      </xs:element>
    </xs:sequence>
    <xs:attribute name="id" type="xs:positiveInteger" use="required"/>
    <xs:attribute name="category">
      <xs:simpleType>
        <xs:restriction base="xs:string">
          <xs:enumeration value="programming"/>
          <xs:enumeration value="science"/>
          <xs:enumeration value="fiction"/>
        </xs:restriction>
      </xs:simpleType>
    </xs:attribute>
  </xs:complexType>

</xs:schema>

XPath — Querying XML Documents

XPath (XML Path Language) is a query language for selecting nodes from an XML document. It uses a path expression syntax similar to filesystem paths, plus predicates for filtering.

<!-- Sample XML for XPath examples -->
<bookstore>
  <book category="programming" lang="en">
    <title>Clean Code</title>
    <author>Robert C. Martin</author>
    <price>39.99</price>
    <year>2008</year>
  </book>
  <book category="programming" lang="en">
    <title>SICP</title>
    <author>Harold Abelson</author>
    <price>29.99</price>
    <year>1996</year>
  </book>
  <book category="science" lang="en">
    <title>A Brief History of Time</title>
    <author>Stephen Hawking</author>
    <price>14.99</price>
    <year>1988</year>
  </book>
</bookstore>

XPath Expression Reference:
/bookstore              — root element bookstore
//book                  — all book elements anywhere
/bookstore/book[1]      — first book child of bookstore
/bookstore/book[last()] — last book
//book/@category        — category attribute of all books
//book[@category]       — books that have a category attribute
//book[@category='programming'] — programming books only
//book[price > 20]      — books more expensive than 20
//book[price > 20 and @category='programming'] — combined predicate
//title/text()          — text content of all titles
//book[contains(title, 'Code')] — books whose title contains "Code"
//book[starts-with(title, 'Clean')] — books starting with "Clean"
count(//book)           — count of all books
sum(//price)            — sum of all prices
//book[year < 2000]/title — titles of books before 2000
// XPath in JavaScript (browser) — XPathEvaluator API
function xpath(expression: string, doc: Document): string[] {
  const result = doc.evaluate(
    expression,
    doc,
    null,
    XPathResult.ORDERED_NODE_SNAPSHOT_TYPE,
    null
  );
  const nodes: string[] = [];
  for (let i = 0; i < result.snapshotLength; i++) {
    nodes.push(result.snapshotItem(i)?.textContent || '');
  }
  return nodes;
}

// Usage
const doc = new DOMParser().parseFromString(xmlString, 'application/xml');
const titles = xpath('//title/text()', doc);
// => ['Clean Code', 'SICP', 'A Brief History of Time']

const expensiveBooks = xpath('//book[price > 20]/title/text()', doc);
// => ['Clean Code', 'SICP']

// String value XPath result
function xpathString(expression: string, doc: Document): string {
  const result = doc.evaluate(expression, doc, null, XPathResult.STRING_TYPE, null);
  return result.stringValue;
}

const firstTitle = xpathString('//book[1]/title', doc);
// => 'Clean Code'

XSLT — Transforming XML to HTML, Text, or Other XML

XSLT (eXtensible Stylesheet Language Transformations) transforms XML documents into another format using template-based rules. An XSLT processor applies the stylesheet to the source XML and produces output. XSLT 1.0 is universally supported in browsers; XSLT 2.0/3.0 require the Saxon library.

<!-- XSLT 1.0 stylesheet — books XML to HTML table -->
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <!-- Root template — matches the document root -->
  <xsl:template match="/">
    <html>
      <head>
        <title>Book Catalog</title>
        <style>
          table { border-collapse: collapse; width: 100%; }
          th, td { border: 1px solid #ddd; padding: 8px; }
          th { background: #f4f4f4; }
        </style>
      </head>
      <body>
        <h1>Book Catalog</h1>
        <table>
          <tr>
            <th>Title</th>
            <th>Author</th>
            <th>Price</th>
            <th>Year</th>
          </tr>
          <!-- For each book element, create a table row -->
          <xsl:for-each select="bookstore/book">
            <!-- Sort by price descending -->
            <xsl:sort select="price" data-type="number" order="descending"/>
            <tr>
              <td><xsl:value-of select="title"/></td>
              <td><xsl:value-of select="author"/></td>
              <td>$<xsl:value-of select="price"/></td>
              <td><xsl:value-of select="year"/></td>
            </tr>
          </xsl:for-each>
        </table>
        <!-- Conditional output -->
        <p>
          Total books: <xsl:value-of select="count(bookstore/book)"/>
        </p>
        <xsl:if test="count(bookstore/book) > 5">
          <p>Large catalog!</p>
        </xsl:if>
      </body>
    </html>
  </xsl:template>

</xsl:stylesheet>
// Apply XSLT in browser using XSLTProcessor API
async function transformXml(xmlString: string, xsltString: string): Promise<string> {
  const parser = new DOMParser();
  const xmlDoc = parser.parseFromString(xmlString, 'application/xml');
  const xsltDoc = parser.parseFromString(xsltString, 'application/xml');

  const xsltProcessor = new XSLTProcessor();
  xsltProcessor.importStylesheet(xsltDoc);

  // Apply transformation
  const resultDoc = xsltProcessor.transformToDocument(xmlDoc);
  const serializer = new XMLSerializer();
  return serializer.serializeToString(resultDoc);
}

// Usage
const htmlOutput = await transformXml(booksXml, xsltStylesheet);
document.getElementById('output')!.innerHTML = htmlOutput;

XML Namespaces — Avoiding Name Conflicts

XML namespaces allow elements from different vocabularies to coexist in the same document without name collisions. A namespace is declared with xmlns or xmlns:prefixattributes, associating a URI with elements in scope.

<?xml version="1.0" encoding="UTF-8"?>
<!-- Multiple namespaces in one document -->
<root
  xmlns="http://default.example.com"           <!-- default namespace -->
  xmlns:book="http://books.example.com"
  xmlns:price="http://pricing.example.com"
  xmlns:xhtml="http://www.w3.org/1999/xhtml">

  <!-- Uses default namespace -->
  <title>Book Catalog</title>

  <!-- Uses book: namespace prefix -->
  <book:catalog version="2.0">
    <book:item book:id="1">
      <book:title>Clean Code</book:title>
      <!-- Mixed namespaces in one element tree -->
      <price:cost price:currency="USD">39.99</price:cost>
    </book:item>
  </book:catalog>

  <!-- Embedded XHTML with its own namespace -->
  <description>
    <xhtml:p>This is a <xhtml:strong>great</xhtml:strong> book.</xhtml:p>
  </description>

</root>

<!-- Real-world: SOAP envelope uses multiple namespaces -->
<soap:Envelope
  xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xmlns:xsd="http://www.w3.org/2001/XMLSchema">
  <soap:Header/>
  <soap:Body>
    <GetStockPrice xmlns="http://myservice.example.com/">
      <StockName>IBM</StockName>
    </GetStockPrice>
  </soap:Body>
</soap:Envelope>
// Handling namespaces in JavaScript
const xmlWithNs = `<lib:library xmlns:lib="http://books.example.com">
  <lib:book lib:id="1">
    <lib:title>Clean Code</lib:title>
  </lib:book>
</lib:library>`;

const parser = new DOMParser();
const doc = parser.parseFromString(xmlWithNs, 'application/xml');

// Use getElementsByTagNameNS for namespace-aware queries
const books = doc.getElementsByTagNameNS('http://books.example.com', 'book');
console.log(books.length);  // => 1

// getAttribute needs namespace for namespaced attributes
const book = books[0];
const id = book.getAttributeNS('http://books.example.com', 'id');
console.log(id);  // => '1'

// XPathEvaluator with namespace resolver
const nsResolver = {
  lookupNamespaceURI: (prefix: string | null) => {
    if (prefix === 'lib') return 'http://books.example.com';
    return null;
  }
};
const result = doc.evaluate(
  '//lib:book/lib:title',
  doc,
  nsResolver as XPathNSResolver,
  XPathResult.ORDERED_NODE_SNAPSHOT_TYPE,
  null
);

CDATA Sections — Embedding Special Characters

CDATA sections let you include text that contains XML special characters (<, >,&) without escaping them. The parser treats the entire CDATA block as raw character data.

<!-- CDATA vs entity references — choose based on amount of special chars -->

<!-- Without CDATA — requires escaping every special character -->
<code>
  if (x &lt; 10 &amp;&amp; y &gt; 0) {
    document.write(&quot;Hello &amp; World&quot;);
  }
</code>

<!-- With CDATA — no escaping needed, much more readable -->
<code><![CDATA[
  if (x < 10 && y > 0) {
    document.write("Hello & World");
  }
]]></code>

<!-- CDATA is commonly used for: -->
<!-- 1. Embedding script code in XML-based formats -->
<script type="text/javascript">
<![CDATA[
  function validate() {
    var re = /^[a-z]+$/i;
    return re.test(document.getElementById('name').value);
  }
]]>
</script>

<!-- 2. Embedding SQL in configuration XML -->
<query><![CDATA[
  SELECT * FROM users
  WHERE name LIKE '%O''Brien%'
  AND age > 21 AND age < 65;
]]></query>

<!-- 3. Embedding HTML content in XML -->
<description><![CDATA[
  <p>Learn <strong>XML formatting</strong> &amp; validation.</p>
  <ul>
    <li>Parse XML</li>
    <li>Validate with XSD</li>
  </ul>
]]></description>

<!-- CDATA cannot contain ]]> (the closing sequence) -->
<!-- Workaround: split into two CDATA sections -->
<text><![CDATA[First part ]]]]><![CDATA[> second part]]></text>

<!-- XML Entity References for single special characters -->
<!-- &amp;  = &    (ampersand)                          -->
<!-- &lt;   = <    (less than)                          -->
<!-- &gt;   = >    (greater than)                       -->
<!-- &quot; = "    (double quote)                        -->
<!-- &apos; = '    (single quote / apostrophe)           -->

XML vs JSON — Comparison Table and When to Use Each

Both XML and JSON are widely used for data interchange, but they have different strengths. JSON dominates modern REST APIs while XML remains essential for SOAP, RSS, SVG, and document formats.

FeatureXMLJSON
VerbosityMore verbose (closing tags)Compact, less overhead
Parsing speedSlower (more complex)Faster (JSON.parse)
Schema supportDTD, XSD, RELAX NG (mature)JSON Schema (less mature)
AttributesElements + attributesKeys and values only
CommentsSupportedNot supported
NamespacesFull namespace supportNo native namespaces
Query languageXPath, XQueryJSONPath, jq
TransformXSLTjq, JavaScript
Binary dataBase64 or CDATABase64 string
Use casesSOAP, RSS, SVG, Office formats, configREST APIs, web storage, config

Choose XML when: working with SOAP/WSDL web services, generating or parsing RSS/Atom feeds, working with SVG graphics, processing Microsoft Office Open XML (.docx, .xlsx), or building Android UI layouts. Choose JSON when: building REST APIs, storing data in NoSQL databases, sending data between frontend and backend, or working with modern JavaScript frameworks.

Common XML Errors and How to Fix Them

XML parsers are strict — any well-formedness violation is a fatal error. Here are the most common XML errors and how to fix them:

<!-- ERROR 1: Unclosed tags -->
<!-- Invalid -->
<root>
  <name>Alice
  <age>30</age>
</root>
<!-- Fixed -->
<root>
  <name>Alice</name>
  <age>30</age>
</root>

<!-- ERROR 2: Tags not properly nested (overlapping) -->
<!-- Invalid -->
<bold><italic>text</bold></italic>
<!-- Fixed -->
<bold><italic>text</italic></bold>

<!-- ERROR 3: Unquoted attribute values -->
<!-- Invalid -->
<book id=1 category=programming>
<!-- Fixed -->
<book id="1" category="programming">

<!-- ERROR 4: Unescaped special characters in text content -->
<!-- Invalid — ampersand must be escaped -->
<title>Kernighan & Ritchie</title>
<!-- Fixed -->
<title>Kernighan &amp; Ritchie</title>
<!-- Or use CDATA -->
<title><![CDATA[Kernighan & Ritchie]]></title>

<!-- ERROR 5: Unescaped angle brackets in attribute values -->
<!-- Invalid -->
<filter condition="price < 50">
<!-- Fixed -->
<filter condition="price &lt; 50">

<!-- ERROR 6: Invalid characters in element names -->
<!-- Invalid — element names cannot start with a number or contain spaces -->
<1st-book>, <book name>, <book@store>
<!-- Fixed -->
<first-book>, <book-name>, <book-store>

<!-- ERROR 7: Missing XML declaration encoding when using non-ASCII -->
<!-- Can cause issues with non-UTF-8 files -->
<!-- Add encoding declaration -->
<?xml version="1.0" encoding="UTF-8"?>

<!-- ERROR 8: Byte Order Mark (BOM) issues -->
<!-- Some editors add UTF-8 BOM (EF BB BF) before XML declaration -->
<!-- This breaks many parsers — save without BOM -->
<!-- In vim: :set nobomb | :w -->
<!-- In Python: use 'utf-8-sig' codec for BOM-aware reading -->

import xml.etree.ElementTree as ET
# Remove BOM if present
with open('file.xml', 'r', encoding='utf-8-sig') as f:
    content = f.read()
root = ET.fromstring(content)

<!-- ERROR 9: Duplicate attribute names -->
<!-- Invalid — attributes must be unique within an element -->
<book id="1" id="2">
<!-- Fixed — use different attribute names or child elements -->
<book primary-id="1" alternate-id="2">

<!-- ERROR 10: Multiple root elements -->
<!-- Invalid -->
<root1></root1>
<root2></root2>
<!-- Fixed — wrap in a single root -->
<root>
  <root1></root1>
  <root2></root2>
</root>

Large XML Streaming — SAX Parsers and iterparse

DOM parsers load the entire XML document into memory as a tree structure. For large XML files (hundreds of MB or GB), this is impractical. SAX (Simple API for XML) and streaming parsers process XML as a sequence of events without loading the whole document.

# Python — ElementTree iterparse (streaming, memory-efficient)
import xml.etree.ElementTree as ET

def count_books_streaming(filepath: str) -> int:
    """Process a large XML file with millions of books"""
    count = 0

    # iterparse yields (event, element) tuples
    for event, element in ET.iterparse(filepath, events=('start', 'end')):
        if event == 'end' and element.tag == 'book':
            count += 1
            # CRITICAL: Clear the element to free memory
            element.clear()

    return count

# Extract data from large XML with iterparse
def extract_books(filepath: str, max_price: float) -> list[dict]:
    books = []
    current_book = {}

    for event, element in ET.iterparse(filepath, events=('start', 'end')):
        if event == 'start':
            if element.tag == 'book':
                current_book = {'id': element.get('id')}

        elif event == 'end':
            if element.tag == 'title':
                current_book['title'] = element.text
            elif element.tag == 'price':
                current_book['price'] = float(element.text or 0)
            elif element.tag == 'book':
                if current_book.get('price', 0) <= max_price:
                    books.append(current_book.copy())
                element.clear()  # Free memory

    return books

# SAX Parser in Python — even more memory efficient
import xml.sax
import xml.sax.handler

class BookHandler(xml.sax.handler.ContentHandler):
    def __init__(self):
        self.books = []
        self.current_book = {}
        self.current_element = ''
        self.current_text = ''

    def startElement(self, name, attrs):
        self.current_element = name
        self.current_text = ''
        if name == 'book':
            self.current_book = {'id': attrs.get('id', '')}

    def characters(self, content):
        self.current_text += content

    def endElement(self, name):
        if name in ('title', 'author', 'price'):
            self.current_book[name] = self.current_text.strip()
        elif name == 'book':
            self.books.append(self.current_book.copy())
            self.current_book = {}

handler = BookHandler()
xml.sax.parse('large_library.xml', handler)
print(f"Parsed {len(handler.books)} books")
// Node.js SAX streaming with 'sax' package
// npm install sax @types/sax

import sax from 'sax';
import { createReadStream } from 'fs';

interface Book {
  id: string;
  title: string;
  author: string;
  price: number;
}

function streamParseBooks(filePath: string): Promise<Book[]> {
  return new Promise((resolve, reject) => {
    const books: Book[] = [];
    let currentBook: Partial<Book> = {};
    let currentElement = '';
    let currentText = '';

    const parser = sax.createStream(true, { lowercase: false });

    parser.on('opentag', (node) => {
      currentElement = node.name;
      currentText = '';
      if (node.name === 'book') {
        currentBook = { id: node.attributes['id'] as string };
      }
    });

    parser.on('text', (text) => {
      currentText += text;
    });

    parser.on('closetag', (tagName) => {
      const text = currentText.trim();
      if (tagName === 'title') currentBook.title = text;
      else if (tagName === 'author') currentBook.author = text;
      else if (tagName === 'price') currentBook.price = parseFloat(text);
      else if (tagName === 'book') {
        books.push(currentBook as Book);
        currentBook = {};
      }
      currentText = '';
    });

    parser.on('error', reject);
    parser.on('end', () => resolve(books));

    createReadStream(filePath).pipe(parser);
  });
}

// Usage
const books = await streamParseBooks('library.xml');
console.log(`Parsed ${books.length} books from large file`);

Real-World XML Formats — RSS, SOAP, Maven, Android, Office

RSS 2.0 Feed

<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Dev Blog</title>
    <link>https://example.com</link>
    <description>Latest developer articles</description>
    <language>en-us</language>
    <lastBuildDate>Thu, 27 Feb 2026 00:00:00 +0000</lastBuildDate>
    <atom:link href="https://example.com/rss.xml" rel="self" type="application/rss+xml"/>

    <item>
      <title>XML Formatting Guide 2026</title>
      <link>https://example.com/xml-guide</link>
      <description><![CDATA[Complete guide to XML formatting and validation.]]></description>
      <pubDate>Thu, 27 Feb 2026 12:00:00 +0000</pubDate>
      <guid isPermaLink="true">https://example.com/xml-guide</guid>
      <category>XML</category>
    </item>
  </channel>
</rss>

Maven pom.xml

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0
                             http://maven.apache.org/xsd/maven-4.0.0.xsd">

  <modelVersion>4.0.0</modelVersion>
  <groupId>com.example</groupId>
  <artifactId>my-app</artifactId>
  <version>1.0.0</version>
  <packaging>jar</packaging>

  <properties>
    <maven.compiler.source>21</maven.compiler.source>
    <maven.compiler.target>21</maven.compiler.target>
    <spring.boot.version>3.3.0</spring.boot.version>
  </properties>

  <dependencies>
    <dependency>
      <groupId>org.springframework.boot</groupId>
      <artifactId>spring-boot-starter-web</artifactId>
      <version>${spring.boot.version}</version>
    </dependency>
    <dependency>
      <groupId>junit</groupId>
      <artifactId>junit</artifactId>
      <version>4.13.2</version>
      <scope>test</scope>
    </dependency>
  </dependencies>

  <build>
    <plugins>
      <plugin>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-maven-plugin</artifactId>
      </plugin>
    </plugins>
  </build>

</project>

Android Layout XML (res/layout/activity_main.xml)

<?xml version="1.0" encoding="utf-8"?>
<LinearLayout
  xmlns:android="http://schemas.android.com/apk/res/android"
  xmlns:app="http://schemas.android.com/apk/res-auto"
  android:layout_width="match_parent"
  android:layout_height="match_parent"
  android:orientation="vertical"
  android:padding="16dp">

  <TextView
    android:id="@+id/title"
    android:layout_width="wrap_content"
    android:layout_height="wrap_content"
    android:text="@string/app_title"
    android:textSize="24sp"
    android:textStyle="bold"/>

  <EditText
    android:id="@+id/input"
    android:layout_width="match_parent"
    android:layout_height="wrap_content"
    android:hint="@string/enter_text"
    android:inputType="text"/>

  <Button
    android:id="@+id/submit"
    android:layout_width="wrap_content"
    android:layout_height="wrap_content"
    android:text="@string/submit"
    app:backgroundTint="@color/primary"/>

</LinearLayout>

SOAP Request and Response

<!-- SOAP 1.1 Request -->
POST /StockPrice HTTP/1.1
Host: www.example.com
Content-Type: text/xml; charset=utf-8
SOAPAction: "GetStockPrice"

<?xml version="1.0"?>
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"
               xmlns:m="http://www.example.com/stock/">
  <soap:Header>
    <m:AuthToken>Bearer eyJhbGciOiJSUzI1NiJ9...</m:AuthToken>
  </soap:Header>
  <soap:Body>
    <m:GetStockPrice>
      <m:StockName>IBM</m:StockName>
      <m:Currency>USD</m:Currency>
    </m:GetStockPrice>
  </soap:Body>
</soap:Envelope>

<!-- SOAP Response -->
<?xml version="1.0"?>
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
  <soap:Body>
    <m:GetStockPriceResponse xmlns:m="http://www.example.com/stock/">
      <m:Price>175.43</m:Price>
      <m:Currency>USD</m:Currency>
      <m:Timestamp>2026-02-27T12:00:00Z</m:Timestamp>
    </m:GetStockPriceResponse>
  </soap:Body>
</soap:Envelope>

<!-- SOAP Fault (error response) -->
<?xml version="1.0"?>
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
  <soap:Body>
    <soap:Fault>
      <faultcode>soap:Client</faultcode>
      <faultstring>Invalid stock symbol</faultstring>
      <detail>
        <m:StockError xmlns:m="http://www.example.com/stock/">
          <m:ErrorCode>INVALID_SYMBOL</m:ErrorCode>
        </m:StockError>
      </detail>
    </soap:Fault>
  </soap:Body>
</soap:Envelope>

Java — XML Validation with JAXB and javax.xml

Java provides comprehensive XML support through the standard library. DocumentBuilder parses XML,javax.xml.validation.Validator validates against schemas, and JAXB marshals between Java objects and XML.

import javax.xml.parsers.*;
import javax.xml.validation.*;
import javax.xml.transform.*;
import javax.xml.transform.stream.*;
import org.w3c.dom.*;
import java.io.*;

// Parse and validate XML against XSD in Java
public class XmlValidator {

    public static Document parseXml(String xmlFilePath) throws Exception {
        DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
        factory.setNamespaceAware(true);  // Required for namespace processing
        DocumentBuilder builder = factory.newDocumentBuilder();
        return builder.parse(new File(xmlFilePath));
    }

    public static boolean validateAgainstXsd(String xmlFilePath, String xsdFilePath) {
        try {
            SchemaFactory schemaFactory =
                SchemaFactory.newInstance("http://www.w3.org/2001/XMLSchema");
            Schema schema = schemaFactory.newSchema(new File(xsdFilePath));
            Validator validator = schema.newValidator();

            // Collect all errors instead of throwing on first error
            final java.util.List<String> errors = new java.util.ArrayList<>();
            validator.setErrorHandler(new org.xml.sax.ErrorHandler() {
                public void warning(org.xml.sax.SAXParseException e) {
                    errors.add("Warning: " + e.getMessage());
                }
                public void error(org.xml.sax.SAXParseException e) {
                    errors.add("Error at line " + e.getLineNumber() + ": " + e.getMessage());
                }
                public void fatalError(org.xml.sax.SAXParseException e) {
                    errors.add("Fatal: " + e.getMessage());
                }
            });

            validator.validate(new StreamSource(new File(xmlFilePath)));

            if (errors.isEmpty()) {
                System.out.println("XML is valid!");
                return true;
            } else {
                errors.forEach(System.err::println);
                return false;
            }
        } catch (Exception e) {
            System.err.println("Validation failed: " + e.getMessage());
            return false;
        }
    }

    public static void prettyPrint(Document doc) throws Exception {
        TransformerFactory transformerFactory = TransformerFactory.newInstance();
        Transformer transformer = transformerFactory.newTransformer();
        transformer.setOutputProperty(OutputKeys.INDENT, "yes");
        transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "2");
        transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");

        StringWriter writer = new StringWriter();
        transformer.transform(
            new DOMSource(doc),
            new StreamResult(writer)
        );
        System.out.println(writer.toString());
    }

    // XPath queries in Java
    public static String xpathQuery(Document doc, String expression) throws Exception {
        javax.xml.xpath.XPathFactory xpathFactory =
            javax.xml.xpath.XPathFactory.newInstance();
        javax.xml.xpath.XPath xpath = xpathFactory.newXPath();
        return xpath.evaluate(expression, doc);
    }

    public static void main(String[] args) throws Exception {
        Document doc = parseXml("library.xml");
        boolean valid = validateAgainstXsd("library.xml", "library.xsd");
        prettyPrint(doc);

        // XPath query
        String firstTitle = xpathQuery(doc, "//book[1]/title");
        System.out.println("First book: " + firstTitle);
    }
}

Key Takeaways

  • XML is strict: all tags must be closed, attributes must be quoted, and any malformed document causes a fatal error. HTML is lenient but XML is not.
  • Use DOMParser + XMLSerializer in browsers for XML parsing; implement custom indentation for pretty-printing since browsers do not format by default.
  • fast-xml-parser is the recommended Node.js library — zero dependencies and the fastest parse speed. Use XMLValidator.validate() before parsing.
  • Python 3.9+ added ET.indent() to ElementTree, making pretty-printing simple without minidom. Use lxml for XPath, XSD validation, and XSLT.
  • XSD (XML Schema) is the industry standard for XML validation — it validates data types, patterns, cardinality, and namespaces. DTD is legacy; RELAX NG is simpler than XSD.
  • XPath expressions select nodes using path syntax. Use XPathEvaluator in browsers and element.xpath() in Python lxml. Always specify namespaces when querying namespaced XML.
  • For large XML files, use streaming parsers: Python iterparse() with element.clear(), or Node.js sax stream. DOM parsers load the entire file into memory.
  • CDATA sections (<![CDATA[ ... ]]>) let you embed raw text with special characters. Use for embedding code, SQL, or HTML inside XML without escaping every & and <.
  • XML namespaces prevent element name conflicts. Always specify xmlns when processing namespaced XML — querySelector does not work with namespaces, use getElementsByTagNameNS() instead.
  • Real-world XML: RSS/Atom for feeds, SOAP for web services, Maven pom.xml for Java builds, Android layouts, and Office Open XML (.docx, .xlsx) all rely on XML.
𝕏 Twitterin LinkedIn
도움이 되었나요?

최신 소식 받기

주간 개발 팁과 새 도구 알림을 받으세요.

스팸 없음. 언제든 구독 해지 가능.

Try These Related Tools

<>XML FormatterXJXML to JSON Converter{ }JSON FormatterY{}JSON ↔ YAML Converter

Related Articles

XML vs JSON: 언제 어떤 것을 사용할까 — 개발자를 위한 완전 비교

XML과 JSON의 데이터 교환 철저 비교. 구문, 파싱, 크기, 가독성, 스키마 검증, 실제 사용 사례.

XML to JSON 변환 가이드: JavaScript, Python, Java, CLI 예제

XML을 JSON으로 온라인 변환. fast-xml-parser, xmltodict, Jackson을 사용한 XML→JSON 변환 방법을 알아보세요.

JSON 포매터 온라인 가이드: 정렬, 압축, 검증, 대용량 파일 스트리밍

JSON 포매팅 완벽 가이드. JSON.stringify 인덴트, 압축 vs 정렬, 온라인 도구와 CLI, 검증, JSON5/JSONC, 스트리밍, 에디터 통합, CI/CD 린트, JSON Schema.

JSON 포매터 & 검증기: 온라인 포맷, 검증 완전 가이드

무료 온라인 JSON 포매터와 검증기. JSON 정리, 구문 오류 찾기, JavaScript와 Python 코드 예제.