我正在尝试使用Apache创建一个自动化(.docx) MS文件。Java程序的输入包含文本、图像和LaTeX样式方程(嵌入$ or中)。
我的问题是如何在Word中添加这个LaTeX样式方程,以便当.docx文件在MS Word中被编辑时,它将该方程识别为MS Word样式方程(OMML类型)。
注意:,我认为方法应该是将LaTeX方程转换为MathML。如果是这样,那么如何将MathML添加到.docx中?
发布于 2017-10-08 07:37:32
Microsoft提供XSLT样式表,用于将OMML转换为MathML (OMML2MML.XSL
),以及使用XSLT
将MathML转换为OMML (MML2OMML.XSL
)。
如果您安装了Microsoft Office
,您将在Office程序目录中找到这些文件。在我的系统里:
使用它,我们可以使用XSLT将MathML转换为OMML。
示例:
import java.io.*;
import org.apache.poi.xwpf.usermodel.*;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTP;
import org.openxmlformats.schemas.officeDocument.x2006.math.CTOMath;
import org.openxmlformats.schemas.officeDocument.x2006.math.CTOMathPara;
import org.openxmlformats.schemas.officeDocument.x2006.math.CTR;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.stream.StreamSource;
import javax.xml.transform.stream.StreamResult;
import org.apache.xmlbeans.XmlCursor;
/*
needs the full ooxml-schemas-*.jar or poi-ooxml-full-5.0.0.jar as mentioned in https://poi.apache.org/faq.html#faq-N10025
*/
public class CreateWordFormulaFromMathML {
static File stylesheet = new File("MML2OMML.XSL");
static TransformerFactory tFactory = TransformerFactory.newInstance();
static StreamSource stylesource = new StreamSource(stylesheet);
static CTOMath getOMML(String mathML) throws Exception {
Transformer transformer = tFactory.newTransformer(stylesource);
StringReader stringreader = new StringReader(mathML);
StreamSource source = new StreamSource(stringreader);
StringWriter stringwriter = new StringWriter();
StreamResult result = new StreamResult(stringwriter);
transformer.transform(source, result);
String ooML = stringwriter.toString();
stringwriter.close();
CTOMathPara ctOMathPara = CTOMathPara.Factory.parse(ooML);
CTOMath ctOMath = ctOMathPara.getOMathArray(0);
//for making this to work with Office 2007 Word also, special font settings are necessary
XmlCursor xmlcursor = ctOMath.newCursor();
while (xmlcursor.hasNextToken()) {
XmlCursor.TokenType tokentype = xmlcursor.toNextToken();
if (tokentype.isStart()) {
if (xmlcursor.getObject() instanceof CTR) {
CTR cTR = (CTR)xmlcursor.getObject();
cTR.addNewRPr2().addNewRFonts().setAscii("Cambria Math");
cTR.getRPr2().getRFonts().setHAnsi("Cambria Math"); // up to apache poi 4.1.2
//cTR.getRPr2().getRFontsArray(0).setHAnsi("Cambria Math"); // since apache poi 5.0.0
}
}
}
return ctOMath;
}
public static void main(String[] args) throws Exception {
XWPFDocument document = new XWPFDocument();
XWPFParagraph paragraph = document.createParagraph();
XWPFRun run = paragraph.createRun();
run.setText("The Pythagorean theorem: ");
String mathML =
"<math xmlns=\"http://www.w3.org/1998/Math/MathML\">"
+"<mrow>"
+"<msup><mi>a</mi><mn>2</mn></msup><mo>+</mo><msup><mi>b</mi><mn>2</mn></msup><mo>=</mo><msup><mi>c</mi><mn>2</mn></msup>"
+"</mrow>"
+"</math>";
CTOMath ctOMath = getOMML(mathML);
System.out.println(ctOMath);
CTP ctp = paragraph.getCTP();
ctp.setOMathArray(new CTOMath[]{ctOMath});
paragraph = document.createParagraph();
run = paragraph.createRun();
run.setText("The Quadratic Formula: ");
mathML =
"<math xmlns=\"http://www.w3.org/1998/Math/MathML\">"
+"<mrow>"
+"<mi>x</mi><mo>=</mo><mfrac><mrow><mrow><mo>-</mo><mi>b</mi></mrow><mo>±</mo><msqrt><mrow><msup><mi>b</mi><mn>2</mn></msup><mo>-</mo><mrow><mn>4</mn><mo></mo><mi>a</mi><mo></mo><mi>c</mi></mrow></mrow></msqrt></mrow><mrow><mn>2</mn><mo></mo><mi>a</mi></mrow></mfrac>"
+"</mrow>"
+"</math>";
ctOMath = getOMML(mathML);
System.out.println(ctOMath);
ctp = paragraph.getCTP();
ctp.setOMathArray(new CTOMath[]{ctOMath});
FileOutputStream out = new FileOutputStream("CreateWordFormulaFromMathML.docx");
document.write(out);
out.close();
document.close();
}
}
注意,这段代码需要完整的ooxml-schemas-*.jar
或poi-ooxml-full-5.0.0.jar
,如https://poi.apache.org/faq.html#faq-N10025中提到的那样。
当然,也有可用于将LaTeX转换为MathML的Java库。例如:http://www.fmath.info/java/download.jsp。
下载:fmath-mathml-java-test-project-b1124.zip
,类路径中有/lib/fmath-mathml-java.jar
和/lib/jdom-2.0.6.jar
,以下工作:
import java.io.*;
import org.apache.poi.xwpf.usermodel.*;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTP;
import org.openxmlformats.schemas.officeDocument.x2006.math.CTOMath;
import org.openxmlformats.schemas.officeDocument.x2006.math.CTOMathPara;
import org.openxmlformats.schemas.officeDocument.x2006.math.CTR;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.stream.StreamSource;
import javax.xml.transform.stream.StreamResult;
import org.apache.xmlbeans.XmlCursor;
/*
needs the full ooxml-schemas-1.3.jar as mentioned in https://poi.apache.org/faq.html#faq-N10025
*/
public class CreateWordFormulaFromLaTeX {
static File stylesheet = new File("MML2OMML.XSL");
static TransformerFactory tFactory = TransformerFactory.newInstance();
static StreamSource stylesource = new StreamSource(stylesheet);
static CTOMath getOMML(String mathML) throws Exception {
Transformer transformer = tFactory.newTransformer(stylesource);
StringReader stringreader = new StringReader(mathML);
StreamSource source = new StreamSource(stringreader);
StringWriter stringwriter = new StringWriter();
StreamResult result = new StreamResult(stringwriter);
transformer.transform(source, result);
String ooML = stringwriter.toString();
stringwriter.close();
CTOMathPara ctOMathPara = CTOMathPara.Factory.parse(ooML);
CTOMath ctOMath = ctOMathPara.getOMathArray(0);
//for making this to work with Office 2007 Word also, special font settings are necessary
XmlCursor xmlcursor = ctOMath.newCursor();
while (xmlcursor.hasNextToken()) {
XmlCursor.TokenType tokentype = xmlcursor.toNextToken();
if (tokentype.isStart()) {
if (xmlcursor.getObject() instanceof CTR) {
CTR cTR = (CTR)xmlcursor.getObject();
cTR.addNewRPr2().addNewRFonts().setAscii("Cambria Math");
cTR.getRPr2().getRFonts().setHAnsi("Cambria Math");
}
}
}
return ctOMath;
}
public static void main(String[] args) throws Exception {
XWPFDocument document = new XWPFDocument();
XWPFParagraph paragraph = document.createParagraph();
XWPFRun run = paragraph.createRun();
run.setText("The Pythagorean theorem: ");
String latex = "$a^2 + b^2 = c^2$";
String mathML = fmath.conversion.ConvertFromLatexToMathML.convertToMathML(latex);
mathML = mathML.replaceFirst("<math ", "<math xmlns=\"http://www.w3.org/1998/Math/MathML\" ");
System.out.println(mathML);
CTOMath ctOMath = getOMML(mathML);
System.out.println(ctOMath);
CTP ctp = paragraph.getCTP();
ctp.setOMathArray(new CTOMath[]{ctOMath});
paragraph = document.createParagraph();
run = paragraph.createRun();
run.setText("The Quadratic Formula: ");
latex = "$x=\\frac{-b\\pm\\sqrt{b^2-4ac}}{2a}$";
mathML = fmath.conversion.ConvertFromLatexToMathML.convertToMathML(latex);
mathML = mathML.replaceFirst("<math ", "<math xmlns=\"http://www.w3.org/1998/Math/MathML\" ");
mathML = mathML.replaceAll("±", "±");
System.out.println(mathML);
ctOMath = getOMML(mathML);
System.out.println(ctOMath);
ctp = paragraph.getCTP();
ctp.setOMathArray(new CTOMath[]{ctOMath});
document.write(new FileOutputStream("CreateWordFormulaFromLaTeX.docx"));
document.close();
}
}
但是每一次转换都包含可能的错误。因此,LaTeX -> MathML -> OMML比MathML -> OMML更容易出错。
在这种情况下,fmath.conversion.ConvertFromLatexToMathML.convertToMathML
会导致没有名称空间的Math
XML
。但是由于XSLT
需要这一点,所以必须手动添加它。
fmath.conversion.ConvertFromLatexToMathML.convertToMathML
使用MML2OMML.XSL
不知道的HTML
实体。所以在这个例子中,"±“必须用"±”代替。
也许SnuggleTeX会是更好的图书馆?
下载它并在类路径中使用snuggletex-core-1.2.2.jar
,我的上一个示例中的以下代码更改工作:
...
String latex = "$a^2 + b^2 = c^2$";
uk.ac.ed.ph.snuggletex.SnuggleEngine engine = new uk.ac.ed.ph.snuggletex.SnuggleEngine();
uk.ac.ed.ph.snuggletex.SnuggleSession session = engine.createSession();
uk.ac.ed.ph.snuggletex.SnuggleInput input = new uk.ac.ed.ph.snuggletex.SnuggleInput(latex);
session.parseInput(input);
String mathML = session.buildXMLString();
System.out.println(mathML);
/*
String mathML = fmath.conversion.ConvertFromLatexToMathML.convertToMathML(latex);
mathML = mathML.replaceFirst("<math ", "<math xmlns=\"http://www.w3.org/1998/Math/MathML\" ");
System.out.println(mathML);
*/
CTOMath ctOMath = getOMML(mathML);
System.out.println(ctOMath);
...
latex = "$x=\\frac{-b\\pm\\sqrt{b^2-4ac}}{2a}$";
engine = new uk.ac.ed.ph.snuggletex.SnuggleEngine();
session = engine.createSession();
input = new uk.ac.ed.ph.snuggletex.SnuggleInput(latex);
session.parseInput(input);
mathML = session.buildXMLString();
System.out.println(mathML);
/*
mathML = fmath.conversion.ConvertFromLatexToMathML.convertToMathML(latex);
mathML = mathML.replaceFirst("<math ", "<math xmlns=\"http://www.w3.org/1998/Math/MathML\" ");
mathML = mathML.replaceAll("±", "±");
System.out.println(mathML);
*/
ctOMath = getOMML(mathML);
System.out.println(ctOMath);
...
不需要人工干预。至少不使用给定的LaTeX示例。
https://stackoverflow.com/questions/46623554
复制相似问题