我的任务是修复一个相当恼人的堆内存不足问题。IBM提供了一个用于Java的Cognos SDK,我们查询存储在内容存储库中的所有包,这些包以xml格式返回。然后,我们解析该xml并将其写入sql数据库。分析显示,最严重的内存问题是由Char[]引起的,这不是很有帮助(堆太大,很难分析),但确实指向DOM解析器。
我们谈论的是500-1500个xml文件(从技术上讲,是XML文本流),这些文件嵌套得非常深,大小各不相同,有时结构也不一样。大小从几KB到30MB不等,在大约300个包之后,程序将消耗超过8 GB的内存。我之前的程序员通过在每次xml解析后执行手动System.gc调用来处理这个问题,我希望摆脱这种方式(而且它实际上也不能解决这个问题,只是使它在最小的500个包服务器上可行)。
我尝试使用JAXB,但它有一个奇怪的结构,这使得在这里使用它非常困难(它有一些“文件夹或querySubject”的东西在运行)。上周我尝试了STAX几个小时,但还是不能很好地工作,WoodStox也是如此。我真的找不到这样做的例子或教程。接下来我研究了JDOM (因为我读到它的内存处理比纯DOM要好得多),但是我不知道如何让它像DOM那样深入地解析。当前DOM解析:
is = new ByteArrayInputStream(xml.getBytes("UTF-8"));
xmlDoc = builder.parse(is);
is.close();
String _path, datatype, regularAggregate, description, formula;
String table, tableLoc;
NodeList elements = xmlDoc.getElementsByTagName("*");
for (int j = 0; j < elements.getLength(); j++) {
Element element = (Element) elements.item(j);
String nodeName = element.getNodeName();
if (nodeName=="queryItem" || nodeName=="measure"||
nodeName=="calculation" || nodeName=="filter") {
if (element.hasAttribute("_path")) {
path = element.getAttribute("_path"));
}
对于每个属性,依此类推
我的JDOM尝试。目前,它只打印根元素,我还不能比第一个子层更深入:
SAXBuilder saxBuilder = new SAXBuilder();
Document document = saxBuilder.build(inputFile);
System.out.println("Root element :" + document.getRootElement().getName());
Element root = document.getRootElement();
List<Element> rList = root.getChildren("folder");
if (rList!= null) {
for (Element node : rList) {
List<Element> elements = node.getChildren("queryItem");
if (elements!=null) {
for (Element a:elements) {
System.out.println(a.getAttribute("_path"));
}
elements.size();
rList.removeAll(elements);
}
}
生成随机包的xsd结构:
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified">
<xs:element name="ResponseRoot">
<xs:complexType>
<xs:sequence>
<xs:element ref="folder"/>
<xs:element ref="package"/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="package">
<xs:complexType>
<xs:attribute name="description" use="required"/>
<xs:attribute name="name" use="required"/>
<xs:attribute name="screenTip" use="required"/>
</xs:complexType>
</xs:element>
<xs:element name="folder">
<xs:complexType>
<xs:sequence>
<xs:choice minOccurs="0" maxOccurs="unbounded">
<xs:element ref="folder"/>
<xs:element ref="querySubject"/>
</xs:choice>
<xs:element minOccurs="0" maxOccurs="unbounded" ref="filter"/>
</xs:sequence>
<xs:attribute name="_path" use="required"/>
<xs:attribute name="_ref" use="required"/>
<xs:attribute name="description" use="required"/>
<xs:attribute name="isNamespace" use="required" type="xs:integer"/>
<xs:attribute name="name" use="required"/>
<xs:attribute name="screenTip" use="required"/>
</xs:complexType>
</xs:element>
<xs:element name="querySubject">
<xs:complexType>
<xs:sequence>
<xs:element minOccurs="0" maxOccurs="unbounded" ref="queryItem"/>
<xs:element minOccurs="0" maxOccurs="unbounded" ref="queryItemFolder"/>
</xs:sequence>
<xs:attribute name="_path" use="required"/>
<xs:attribute name="_ref" use="required"/>
<xs:attribute name="description" use="required"/>
<xs:attribute name="name" use="required"/>
<xs:attribute name="screenTip" use="required"/>
</xs:complexType>
</xs:element>
<xs:element name="filter">
<xs:complexType>
<xs:attribute name="_path" use="required"/>
<xs:attribute name="_ref" use="required"/>
<xs:attribute name="description" use="required"/>
<xs:attribute name="expression" use="required"/>
<xs:attribute name="name" use="required"/>
<xs:attribute name="screenTip" use="required"/>
</xs:complexType>
</xs:element>
<xs:element name="queryItem">
<xs:complexType>
<xs:attribute name="_path" use="required"/>
<xs:attribute name="_ref" use="required"/>
<xs:attribute name="currency" use="required"/>
<xs:attribute name="datatype" use="required" type="xs:NCName"/>
<xs:attribute name="description" use="required"/>
<xs:attribute name="displayType" use="required" type="xs:NCName"/>
<xs:attribute name="expression" use="required"/>
<xs:attribute name="name" use="required"/>
<xs:attribute name="promptCascadeOnRef" use="required"/>
<xs:attribute name="promptDisplayItemRef" use="required"/>
<xs:attribute name="promptFilterItemRef" use="required"/>
<xs:attribute name="promptType" use="required" type="xs:NCName"/>
<xs:attribute name="regularAggregate" use="required" type="xs:NCName"/>
<xs:attribute name="screenTip" use="required"/>
<xs:attribute name="unSortable" use="required" type="xs:integer"/>
<xs:attribute name="usage" use="required" type="xs:NCName"/>
</xs:complexType>
</xs:element>
xs:element name="queryItemFolder">
<xs:complexType>
<xs:choice minOccurs="0" maxOccurs="unbounded">
<xs:element ref="queryItem"/>
<xs:element ref="queryItemFolder"/>
</xs:choice>
<xs:attribute name="_path" use="required"/>
<xs:attribute name="_ref" use="required"/>
<xs:attribute name="description" use="required"/>
<xs:attribute name="name" use="required"/>
<xs:attribute name="screenTip" use="required"/>
</xs:complexType>
</xs:element>
</xs:schema>
https://stackoverflow.com/questions/52614082
复制相似问题