我想将RTF文件(使用C#或VB.Net)分割成两个或更多的部分,由字符串[BreakPage]
来分割。举个例子,这个文件包含一个[BreakPage]
,需要分成两部分:
{\rtf1\ansi\ansicpg1251\uc1\deff0\stshfdbch0\stshfloch0\stshfhich0\stshfbi0\deflang1049\deflangfe1049{\fonttbl{\f0\froman\fcharset204\fprq2{*\panose 02020603050405020304}
新罗马;}{F38\fcharset0\fprq2 2次新罗马;}{f41 36\froman\fcharset238\fprq2 2次新罗马CE;}{F39\fcharset0 161\fprq2 2次新罗马希腊语;}{F40\froman\fcharset0 162\fprq2 2次新罗马图尔;}{\F41\froman\fcharset177\fprq2次新罗马(希伯来语);}{F42\froman\fcharset178\fprq2 2倍于新罗马(阿拉伯语);}{F43\fcharset186\fprq2 2倍于新罗马波罗的海;}{ff44\froman\fcharset163 163\fprq2 2倍新罗马(Vietnamese);}}{\colortbl;\red0\green0\blue0;\red0\green0\blue255;\red0\green255\blue255;\red0\green255\blue0;\red255\green0\blue255;\red255\green0\blue0;\red255\green255\blue0;\red255\green255\blue255;\red0\green0\blue128;\red0\green128\blue128;\red0\green128\blue0;\red128\green0\blue128;\red128\green0\blue0;\red128\green128\blue0;\red128\green128\blue128;\red192\green192\blue192;}{\stylesheet{\ql \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \fs24\lang1049\langfe1049\cgrid\langnp1049\langfenp1049 \snext0 0普通;}{*\cs10 10\相加\ssemi隐含默认段落字体;}{*\ts11\tsrowd\trftsWidthB3\trpaddl108\trpaddr108\trpaddfl3\trpaddft3\trpaddfb3\trpaddfr3\trcbpat1\trcfpat1\tscellwidthfts0\tsvertalt\tsbrdrt\tsbrdrl\tsbrdrb\tsbrdrr\tsbrdrdgl\tsbrdrdgr\tsbrdrh\tsbrdrv \ql \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \fs20\lang1024\langfe1024\cgrid\langnp1024\langfenp1024 \snext11 11\ssemi隐含正常Table;}}{*\latentstyles\lsdstimax156\lsdlockeddef0}{*\rsidtbl \rsid2111663\rsid7154806 \rsid15558346}{*\生成器Microsoft 11.0.5604;}{\info{\作者程序员}{\操作符Programmer}{\creatim\yr2011\mo8\dy2\hr12\min45}{\revtim\yr2011\mo8\dy5\hr12\min34}{\version3}{\edmins1}{\nofpages1}{\nofwords5}{\nofchars34}{\nofcharsws38} {\vern24689}}\margl1701\margr850\margt1134\margb1134 \widowctrl\ftnbj\aenddoc\noxlattoyen\expshrtn\noultrlspc\dntblnsbdb\nospaceforul\hyphcaps0\horzdoc\dghspace120\dgvspace120\dghorigin1701\dgvorigin1984\dghshow0\dgvshow3 \jcompress\viewkind1 1\sftnbj {*\pnseclvl1\pnucrm\pnstart1\pnindent720\pnhang {\pntxta .}}{*\pnseclvl2\pnucltr\pnstart1\pnindent720\pnhang {\pntxta .}}{*\pnseclvl3 \pnindent720\pnindent720\pntxta .}}{*\pnseclvl4\pnlcltr\pnstart1\pnindent720\pnhang {\pntxta .}}{*\pnseclvl4\pnlcltr\pnstart1\pnindent720\pnhang {\pntxta (})}{*\pnseclvl6\pnlcltr\pnstart1\pnindent720\pnhang {pntxtb(}{pntxta)} {*\pnseclvl7\pnlcrm\pnstart1\pnindent720\pnhang {\pntxtb (}{pntxta )}}{*\pnseclvl8\pnlcltr\pnstart1\pnindent720\pnhang {\pntxta)}}{*\pnseclvl8\pnlcltr\pnstart1\pnindent720\pnhang{\pntxta)}}{*\pnseclvl8\pnlcltr\pnstart1\pnindent720\pnhang(}{pntxta )}}{*\pnseclvl9\pnlcrm\pnstart1\pnindent720\pnhang (}{pntxta )}} )fs24\lang1049\langfe1049\cgrid\langnp1049\langfenp1049 {\b\rsid7154806\charrsid7154806线1 \par }{\rsid7154806 \par }{\i\rsid7154806\charrsid7154806 Line3}{\lang1048\langfe1049\langnp1048\insrsid7154806 \par }{\lang1048\langfe1049\langnp1048\insrsid2111663 BreakPage \{\insrsid7154806 Line4 \par Line5 \par }
有谁可以帮我?
谢谢!
发布于 2011-08-05 09:57:10
问题是RTF在全局头中有一些(但不一定是全部)格式化信息。为了分割RTF文本,使结果再次有效,应用格式,您实际上需要知道标头信息在哪里,并将其复制到一个分块中。
有两种方法可以这样做:
(1)是可行的,但需要时间。幸运的是,RTF解析器已经存在,例如this one on CodeProject。
或者,还可以将RTF文本加载到RichTextBox
中,然后在RichTextBox
中搜索拆分的文本"[BreakPage]"
,以编程方式选择第一部分和第二部分,并使用SelectedRtf
属性检索RTF文本。
https://stackoverflow.com/questions/6954289
复制相似问题