我有一个特定的要求,从一个pdf的file.The区域中的特定区域提取文本和图像可能是选定的或突出显示的,或者是从给定的一组坐标。
当我浏览的时候,所有的方法都是完全从PDF中提取图像和文本,而不是在指定的位置。我尝试了iTextSharp,Syncfussion,Apose,但找不到更好的方法。
如果有人能帮我解决这个问题,那就太好了。你能分享一下你对如何在.net中实现这一点的想法和建议吗?
致敬,Arun.M
发布于 2011-03-23 20:52:41
这段代码从pdf中提取图像
using System;
using System.Data;
using System.Configuration;
using System.Collections;
using System.Drawing.Imaging;
using System.IO;
using System.Web;
using System.Web.Security;
using System.Web.UI;
using System.Web.UI.WebControls;
using System.Web.UI.WebControls.WebParts;
using System.Web.UI.HtmlControls;
using Bytescout.PDFExtractor;
namespace ExtractAllImages
{
public partial class _Default : System.Web.UI.Page
{
protected void Page_Load(object sender, EventArgs e)
{
// This test file will be copied to the project directory on the pre-build event (see the project properties).
String inputFile = Server.MapPath("sample1.pdf");
// Create Bytescout.PDFExtractor.ImageExtractor instance
ImageExtractor extractor = new ImageExtractor();
extractor.RegistrationName = "demo";
extractor.RegistrationKey = "demo";
// Load sample PDF document
extractor.LoadDocumentFromFile("sample1.pdf");
Response.Clear();
int i = 0;
// Initialize image enumeration
if (extractor.GetFirstImage())
{
do
{
if (i == 0) // Write the fist image to the Response stream
{
string imageFileName = "image" + i + ".png";
Response.Write("<b>" + imageFileName + "</b>");
Response.ContentType = "image/png";
Response.AddHeader("Content-Disposition", "inline;filename=" + imageFileName);
// Write the image bytes into the Response output stream
Response.BinaryWrite(extractor.GetCurrentImageAsArrayOfBytes());
}
i++;
} while (extractor.GetNextImage()); // Advance image enumeration
}
Response.End();
}
}
}
https://stackoverflow.com/questions/2424726
复制相似问题