首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >问答首页 >使用Apache在word (.docx)中添加乳胶类型方程

使用Apache在word (.docx)中添加乳胶类型方程
EN

Stack Overflow用户
提问于 2017-10-07 18:16:17
回答 1查看 2.5K关注 0票数 0

我正在尝试使用Apache创建一个自动化(.docx) MS文件。Java程序的输入包含文本、图像和LaTeX样式方程(嵌入$ or中)。

我的问题是如何在Word中添加这个LaTeX样式方程,以便当.docx文件在MS Word中被编辑时,它将该方程识别为MS Word样式方程(OMML类型)。

注意:,我认为方法应该是将LaTeX方程转换为MathML。如果是这样,那么如何将MathML添加到.docx中?

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2017-10-08 07:37:32

Microsoft提供XSLT样式表,用于将OMML转换为MathML (OMML2MML.XSL),以及使用XSLT将MathML转换为OMML (MML2OMML.XSL)。

如果您安装了Microsoft Office,您将在Office程序目录中找到这些文件。在我的系统里:

使用它,我们可以使用XSLT将MathML转换为OMML。

示例:

代码语言:javascript
运行
复制
import java.io.*;
import org.apache.poi.xwpf.usermodel.*;

import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTP;
import org.openxmlformats.schemas.officeDocument.x2006.math.CTOMath;
import org.openxmlformats.schemas.officeDocument.x2006.math.CTOMathPara;
import org.openxmlformats.schemas.officeDocument.x2006.math.CTR;

import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.stream.StreamSource;
import javax.xml.transform.stream.StreamResult;

import org.apache.xmlbeans.XmlCursor;

/*
needs the full ooxml-schemas-*.jar or poi-ooxml-full-5.0.0.jar as mentioned in https://poi.apache.org/faq.html#faq-N10025
*/

public class CreateWordFormulaFromMathML {

 static File stylesheet = new File("MML2OMML.XSL");
 static TransformerFactory tFactory = TransformerFactory.newInstance();
 static StreamSource stylesource = new StreamSource(stylesheet); 

 static CTOMath getOMML(String mathML) throws Exception {
  Transformer transformer = tFactory.newTransformer(stylesource);

  StringReader stringreader = new StringReader(mathML);
  StreamSource source = new StreamSource(stringreader);

  StringWriter stringwriter = new StringWriter();
  StreamResult result = new StreamResult(stringwriter);
  transformer.transform(source, result);

  String ooML = stringwriter.toString();
  stringwriter.close();

  CTOMathPara ctOMathPara = CTOMathPara.Factory.parse(ooML);
  CTOMath ctOMath = ctOMathPara.getOMathArray(0);

  //for making this to work with Office 2007 Word also, special font settings are necessary
  XmlCursor xmlcursor = ctOMath.newCursor();
  while (xmlcursor.hasNextToken()) {
   XmlCursor.TokenType tokentype = xmlcursor.toNextToken();
   if (tokentype.isStart()) {
    if (xmlcursor.getObject() instanceof CTR) {
     CTR cTR = (CTR)xmlcursor.getObject();
     cTR.addNewRPr2().addNewRFonts().setAscii("Cambria Math");
     cTR.getRPr2().getRFonts().setHAnsi("Cambria Math"); // up to apache poi 4.1.2
     //cTR.getRPr2().getRFontsArray(0).setHAnsi("Cambria Math"); // since apache poi 5.0.0
    }
   }
  }

  return ctOMath;
 }

 public static void main(String[] args) throws Exception {

  XWPFDocument document = new XWPFDocument();

  XWPFParagraph paragraph = document.createParagraph();
  XWPFRun run = paragraph.createRun();
  run.setText("The Pythagorean theorem: ");

  String mathML = 
    "<math xmlns=\"http://www.w3.org/1998/Math/MathML\">" 
   +"<mrow>"
   +"<msup><mi>a</mi><mn>2</mn></msup><mo>+</mo><msup><mi>b</mi><mn>2</mn></msup><mo>=</mo><msup><mi>c</mi><mn>2</mn></msup>"
   +"</mrow>"
   +"</math>";

  CTOMath ctOMath = getOMML(mathML);
System.out.println(ctOMath);

  CTP ctp = paragraph.getCTP();
  ctp.setOMathArray(new CTOMath[]{ctOMath});

  paragraph = document.createParagraph();
  run = paragraph.createRun();
  run.setText("The Quadratic Formula: ");

  mathML = 
    "<math xmlns=\"http://www.w3.org/1998/Math/MathML\">"
   +"<mrow>" 
   +"<mi>x</mi><mo>=</mo><mfrac><mrow><mrow><mo>-</mo><mi>b</mi></mrow><mo>±</mo><msqrt><mrow><msup><mi>b</mi><mn>2</mn></msup><mo>-</mo><mrow><mn>4</mn><mo>⁢</mo><mi>a</mi><mo>⁢</mo><mi>c</mi></mrow></mrow></msqrt></mrow><mrow><mn>2</mn><mo>⁢</mo><mi>a</mi></mrow></mfrac>"
   +"</mrow>"
   +"</math>";

  ctOMath = getOMML(mathML);
System.out.println(ctOMath);

  ctp = paragraph.getCTP();
  ctp.setOMathArray(new CTOMath[]{ctOMath});
  
  FileOutputStream out = new FileOutputStream("CreateWordFormulaFromMathML.docx");
  document.write(out);
  out.close();
  document.close();

 }
}

注意,这段代码需要完整的ooxml-schemas-*.jarpoi-ooxml-full-5.0.0.jar,如https://poi.apache.org/faq.html#faq-N10025中提到的那样。

当然,也有可用于将LaTeX转换为MathML的Java库。例如:http://www.fmath.info/java/download.jsp

下载:fmath-mathml-java-test-project-b1124.zip,类路径中有/lib/fmath-mathml-java.jar/lib/jdom-2.0.6.jar,以下工作:

代码语言:javascript
运行
复制
import java.io.*;
import org.apache.poi.xwpf.usermodel.*;

import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTP;
import org.openxmlformats.schemas.officeDocument.x2006.math.CTOMath;
import org.openxmlformats.schemas.officeDocument.x2006.math.CTOMathPara;
import org.openxmlformats.schemas.officeDocument.x2006.math.CTR;

import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.stream.StreamSource;
import javax.xml.transform.stream.StreamResult;

import org.apache.xmlbeans.XmlCursor;

/*
needs the full ooxml-schemas-1.3.jar as mentioned in https://poi.apache.org/faq.html#faq-N10025
*/

public class CreateWordFormulaFromLaTeX {

 static File stylesheet = new File("MML2OMML.XSL");
 static TransformerFactory tFactory = TransformerFactory.newInstance();
 static StreamSource stylesource = new StreamSource(stylesheet); 

 static CTOMath getOMML(String mathML) throws Exception {
  Transformer transformer = tFactory.newTransformer(stylesource);

  StringReader stringreader = new StringReader(mathML);
  StreamSource source = new StreamSource(stringreader);

  StringWriter stringwriter = new StringWriter();
  StreamResult result = new StreamResult(stringwriter);
  transformer.transform(source, result);

  String ooML = stringwriter.toString();
  stringwriter.close();

  CTOMathPara ctOMathPara = CTOMathPara.Factory.parse(ooML);
  CTOMath ctOMath = ctOMathPara.getOMathArray(0);

  //for making this to work with Office 2007 Word also, special font settings are necessary
  XmlCursor xmlcursor = ctOMath.newCursor();
  while (xmlcursor.hasNextToken()) {
   XmlCursor.TokenType tokentype = xmlcursor.toNextToken();
   if (tokentype.isStart()) {
    if (xmlcursor.getObject() instanceof CTR) {
     CTR cTR = (CTR)xmlcursor.getObject();
     cTR.addNewRPr2().addNewRFonts().setAscii("Cambria Math");
     cTR.getRPr2().getRFonts().setHAnsi("Cambria Math");
    }
   }
  }

  return ctOMath;
 }

 public static void main(String[] args) throws Exception {

  XWPFDocument document = new XWPFDocument();

  XWPFParagraph paragraph = document.createParagraph();
  XWPFRun run = paragraph.createRun();
  run.setText("The Pythagorean theorem: ");

  String latex = "$a^2 + b^2 = c^2$";

  String mathML = fmath.conversion.ConvertFromLatexToMathML.convertToMathML(latex);
  mathML = mathML.replaceFirst("<math ", "<math xmlns=\"http://www.w3.org/1998/Math/MathML\" ");
System.out.println(mathML);

  CTOMath ctOMath = getOMML(mathML);
System.out.println(ctOMath);

  CTP ctp = paragraph.getCTP();
  ctp.setOMathArray(new CTOMath[]{ctOMath});


  paragraph = document.createParagraph();
  run = paragraph.createRun();
  run.setText("The Quadratic Formula: ");

  latex = "$x=\\frac{-b\\pm\\sqrt{b^2-4ac}}{2a}$";

  mathML = fmath.conversion.ConvertFromLatexToMathML.convertToMathML(latex);
  mathML = mathML.replaceFirst("<math ", "<math xmlns=\"http://www.w3.org/1998/Math/MathML\" ");
  mathML = mathML.replaceAll("&plusmn;", "±");
System.out.println(mathML);

  ctOMath = getOMML(mathML);
System.out.println(ctOMath);

  ctp = paragraph.getCTP();
  ctp.setOMathArray(new CTOMath[]{ctOMath});

  document.write(new FileOutputStream("CreateWordFormulaFromLaTeX.docx"));
  document.close();

 }
}

但是每一次转换都包含可能的错误。因此,LaTeX -> MathML -> OMML比MathML -> OMML更容易出错。

在这种情况下,fmath.conversion.ConvertFromLatexToMathML.convertToMathML会导致没有名称空间的Math XML。但是由于XSLT需要这一点,所以必须手动添加它。

fmath.conversion.ConvertFromLatexToMathML.convertToMathML使用MML2OMML.XSL不知道的HTML实体。所以在这个例子中,"±“必须用"±”代替。

也许SnuggleTeX会是更好的图书馆?

下载它并在类路径中使用snuggletex-core-1.2.2.jar,我的上一个示例中的以下代码更改工作:

代码语言:javascript
运行
复制
...
  String latex = "$a^2 + b^2 = c^2$";

  uk.ac.ed.ph.snuggletex.SnuggleEngine engine = new uk.ac.ed.ph.snuggletex.SnuggleEngine();
  uk.ac.ed.ph.snuggletex.SnuggleSession session = engine.createSession();
  uk.ac.ed.ph.snuggletex.SnuggleInput input = new uk.ac.ed.ph.snuggletex.SnuggleInput(latex);
  session.parseInput(input);
  String mathML = session.buildXMLString();
System.out.println(mathML);

/*
  String mathML = fmath.conversion.ConvertFromLatexToMathML.convertToMathML(latex);
  mathML = mathML.replaceFirst("<math ", "<math xmlns=\"http://www.w3.org/1998/Math/MathML\" ");
System.out.println(mathML);
*/

  CTOMath ctOMath = getOMML(mathML);
System.out.println(ctOMath);

...

  latex = "$x=\\frac{-b\\pm\\sqrt{b^2-4ac}}{2a}$";

  engine = new uk.ac.ed.ph.snuggletex.SnuggleEngine();
  session = engine.createSession();
  input = new uk.ac.ed.ph.snuggletex.SnuggleInput(latex);
  session.parseInput(input);
  mathML = session.buildXMLString();
System.out.println(mathML);

/*
  mathML = fmath.conversion.ConvertFromLatexToMathML.convertToMathML(latex);
  mathML = mathML.replaceFirst("<math ", "<math xmlns=\"http://www.w3.org/1998/Math/MathML\" ");
  mathML = mathML.replaceAll("&plusmn;", "±");
System.out.println(mathML);
*/

  ctOMath = getOMML(mathML);
System.out.println(ctOMath);
...

不需要人工干预。至少不使用给定的LaTeX示例。

票数 2
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/46623554

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档