如何解决在Java中不使用DOM解析高度嵌套的XML?

内容来源于 Stack Overflow,并遵循CC BY-SA 3.0许可协议进行翻译与使用

  • 回答 (1)
  • 关注 (0)
  • 查看 (105)

我的任务是修复一个相当恼人的Heap内存问题。IBM提供了我们与Java一起使用的Cognos SDK,并且我们查询存储在内容存储库中的所有包,这些包以xml格式返回。然后我们解析该xml并将其写入sql数据库。分析显示最糟糕的内存问题是由Char []引起的,这不是很有用(并且堆很大,很难分析),但确实指向DOM解析器。

我们正在讨论500-1500个xml文件(技术上来说,XML文本流),这些文件是非常嵌套的,并且大小和偶尔在结构上有所不同。大小从几KB到30 MB不等,大约300个软件包后,程序将占用8 GB以上的内存。我之前的程序员通过在每次xml解析之后进行手动System.gc调用来处理这个问题,我希望将其从中移除(并且它实际上并没有解决问题,只是使它在最小的500包服务器上可行)。

我试图使用JAXB,但它有一个奇怪的结构,这使得它很难在这里使用(它有一些“文件夹或querySubject”的事情)。上周我尝试了几个小时的STAX,但是没能完全开始工作,对于WoodStox也是如此。我无法找到关于这样做的示例或教程。JDOM是我接下来要检查的(因为我已经读过它具有比纯DOM更好的内存处理),但我无法弄清楚如何让它像DOM一样深入解析。当前的DOM解析:

            is = new ByteArrayInputStream(xml.getBytes("UTF-8"));
            xmlDoc = builder.parse(is);
            is.close();
        String _path, datatype, regularAggregate, description, formula;
        String table, tableLoc;

            NodeList elements = xmlDoc.getElementsByTagName("*");
            for (int j = 0; j < elements.getLength(); j++) {


                Element element = (Element) elements.item(j);
                String nodeName = element.getNodeName();
                if (nodeName=="queryItem" || nodeName=="measure"|| 
                nodeName=="calculation" || nodeName=="filter") {
                    if (element.hasAttribute("_path")) {
                    path = element.getAttribute("_path"));
                    } 

等等每个属性

我的JDOM尝试。目前,它只打印根元素,而且我还能够比第一个子层更深入:

SAXBuilder saxBuilder = new SAXBuilder();
Document document = saxBuilder.build(inputFile);

System.out.println("Root element :" + document.getRootElement().getName());
Element root = document.getRootElement();

List<Element> rList = root.getChildren("folder");

if (rList!= null) {
    for (Element node : rList) {
        List<Element> elements = node.getChildren("queryItem");
        if (elements!=null) {
            for (Element a:elements) {
            System.out.println(a.getAttribute("_path"));    
            }
            elements.size();
            rList.removeAll(elements);

        }
    }

生成随机包的xsd结构:

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified">
  <xs:element name="ResponseRoot">
    <xs:complexType>
      <xs:sequence>
        <xs:element ref="folder"/>
        <xs:element ref="package"/>
      </xs:sequence>
    </xs:complexType>
  </xs:element>
  <xs:element name="package">
    <xs:complexType>
      <xs:attribute name="description" use="required"/>
      <xs:attribute name="name" use="required"/>
      <xs:attribute name="screenTip" use="required"/>
    </xs:complexType>
  </xs:element>
  <xs:element name="folder">
    <xs:complexType>
      <xs:sequence>
        <xs:choice minOccurs="0" maxOccurs="unbounded">
          <xs:element ref="folder"/>
          <xs:element ref="querySubject"/>
        </xs:choice>
        <xs:element minOccurs="0" maxOccurs="unbounded" ref="filter"/>
      </xs:sequence>
      <xs:attribute name="_path" use="required"/>
      <xs:attribute name="_ref" use="required"/>
      <xs:attribute name="description" use="required"/>
      <xs:attribute name="isNamespace" use="required" type="xs:integer"/>
      <xs:attribute name="name" use="required"/>
      <xs:attribute name="screenTip" use="required"/>
    </xs:complexType>
  </xs:element>
  <xs:element name="querySubject">
    <xs:complexType>
      <xs:sequence>
        <xs:element minOccurs="0" maxOccurs="unbounded" ref="queryItem"/>
        <xs:element minOccurs="0" maxOccurs="unbounded" ref="queryItemFolder"/>
      </xs:sequence>
      <xs:attribute name="_path" use="required"/>
      <xs:attribute name="_ref" use="required"/>
      <xs:attribute name="description" use="required"/>
      <xs:attribute name="name" use="required"/>
      <xs:attribute name="screenTip" use="required"/>
    </xs:complexType>
  </xs:element>
  <xs:element name="filter">
    <xs:complexType>
      <xs:attribute name="_path" use="required"/>
      <xs:attribute name="_ref" use="required"/>
      <xs:attribute name="description" use="required"/>
      <xs:attribute name="expression" use="required"/>
      <xs:attribute name="name" use="required"/>
      <xs:attribute name="screenTip" use="required"/>
    </xs:complexType>
  </xs:element>
  <xs:element name="queryItem">
    <xs:complexType>
      <xs:attribute name="_path" use="required"/>
      <xs:attribute name="_ref" use="required"/>
      <xs:attribute name="currency" use="required"/>
      <xs:attribute name="datatype" use="required" type="xs:NCName"/>
      <xs:attribute name="description" use="required"/>
      <xs:attribute name="displayType" use="required" type="xs:NCName"/>
      <xs:attribute name="expression" use="required"/>
      <xs:attribute name="name" use="required"/>
      <xs:attribute name="promptCascadeOnRef" use="required"/>
      <xs:attribute name="promptDisplayItemRef" use="required"/>
      <xs:attribute name="promptFilterItemRef" use="required"/>
      <xs:attribute name="promptType" use="required" type="xs:NCName"/>
      <xs:attribute name="regularAggregate" use="required" type="xs:NCName"/>
      <xs:attribute name="screenTip" use="required"/>
      <xs:attribute name="unSortable" use="required" type="xs:integer"/>
      <xs:attribute name="usage" use="required" type="xs:NCName"/>
    </xs:complexType>
  </xs:element>
  xs:element name="queryItemFolder">
    <xs:complexType>
      <xs:choice minOccurs="0" maxOccurs="unbounded">
        <xs:element ref="queryItem"/>
        <xs:element ref="queryItemFolder"/>
      </xs:choice>
      <xs:attribute name="_path" use="required"/>
      <xs:attribute name="_ref" use="required"/>
      <xs:attribute name="description" use="required"/>
      <xs:attribute name="name" use="required"/>
      <xs:attribute name="screenTip" use="required"/>
     </xs:complexType>
     </xs:element>
     </xs:schema>
提问于
用户回答回答于

对于嵌套结构,如果为每种元素类型创建方法,则最容易管理

public static void main(String[] args) throws Exception {
    String xml = "<root>" +
                   "<folder name=\"A\">" +
                     "<folder name=\"B\">" +
                       "<book name=\"Learn Java\">" +
                         "<chapter name=\"Hello, World!\"/>" +
                         "<chapter name=\"Variables and Types\"/>" +
                       "</book>" +
                     "</folder>" +
                   "</folder>" +
                 "</root>";
    XMLInputFactory factory = XMLInputFactory.newFactory();
    XMLStreamReader reader = factory.createXMLStreamReader(new StringReader(xml));
    try {
        reader.nextTag(); // Position on root element
        String tagName = reader.getLocalName();
        if (! tagName.equals("root"))
            throw new XMLStreamException("Expected <root> element, found: " + tagName, reader.getLocation());
        parseRoot(reader);
    } finally {
        reader.close();
    }
}

private static void parseRoot(XMLStreamReader reader) throws XMLStreamException {
    while (reader.nextTag() != XMLStreamConstants.END_ELEMENT) {
        String tagName = reader.getLocalName();
        if (tagName.equals("folder")) {
            parseFolder(reader, Collections.emptyList());
        } else {
            throw new XMLStreamException("Expected <folder> element, found: " + tagName, reader.getLocation());
        }
    }
}

private static void parseFolder(XMLStreamReader reader, List<String> parentPaths) throws XMLStreamException {
    String folderName = reader.getAttributeValue(null, "name");
    if (folderName == null)
        throw new XMLStreamException("Missing 'name' attribute on <folder> element", reader.getLocation());
    List<String> folderPath = new ArrayList<>(parentPaths.size() + 1);
    folderPath.addAll(parentPaths);
    folderPath.add(folderName);
    while (reader.nextTag() != XMLStreamConstants.END_ELEMENT) {
        String tagName = reader.getLocalName();
        if (tagName.equals("folder")) {
            parseFolder(reader, folderPath);
        } else if (tagName.equals("book")) {
            parseBook(reader, folderPath);
        } else {
            throw new XMLStreamException("Expected <folder> or <book> element, found: " + tagName, reader.getLocation());
        }
    }
}

private static void parseBook(XMLStreamReader reader, List<String> folderPath) throws XMLStreamException {
    String bookName = reader.getAttributeValue(null, "name");
    if (bookName == null)
        throw new XMLStreamException("Missing 'name' attribute on <book> element", reader.getLocation());
    while (reader.nextTag() != XMLStreamConstants.END_ELEMENT) {
        String tagName = reader.getLocalName();
        if (tagName.equals("chapter")) {
            parseChapter(reader, folderPath, bookName);
        } else {
            throw new XMLStreamException("Expected <chapter> element, found: " + tagName, reader.getLocation());
        }
    }
}

private static void parseChapter(XMLStreamReader reader, List<String> folderPath, String bookName) throws XMLStreamException {
    String chapterName = reader.getAttributeValue(null, "name");
    if (chapterName == null)
        throw new XMLStreamException("Missing 'name' attribute on <chapter> element", reader.getLocation());
    if (! reader.getElementText().isEmpty())
        throw new XMLStreamException("<chapter> element must be empty", reader.getLocation());
    System.out.println("Found:");
    System.out.println("  Folder:  " + folderPath);
    System.out.println("  Book:    " + bookName);
    System.out.println("  Chapter: " + chapterName);
}

输出

Found:
  Folder:  [A, B]
  Book:    Learn Java
  Chapter: Hello, World!
Found:
  Folder:  [A, B]
  Book:    Learn Java
  Chapter: Variables and Types

扫码关注云+社区

领取腾讯云代金券