文章/答案/技术大牛

发布

社区首页 >问答首页 >如何在C#中使用提取PDF文本

问如何在C#中使用提取PDF文本
EN

Stack Overflow用户

提问于 2021-10-04 23:04:28

回答 1查看 216关注 0票数 0

我想使用谷歌视觉，以提取PDF到文本/表格。我的PDF包含一个我想要提取的表(BlockType = table)。

不过，我不知道如何在C#中做到这一点。

我安装了Google.Cloud.Vision.API NuGet并尝试使用DetectTextDocument方法，但它似乎只接收图像。

var client = new ImageAnnotatorClientBuilder
{
    CredentialsPath = @"myjsonfile.json"
}.Build();

Image image = Image.FromUri("https://storage.cloud.google.com/pathtomyfile.pdf");

TextAnnotation response = client.DetectDocumentText(image); // Getting error for a bad image.

然后，我尝试查找任何文件方法并找到BatchAnnotateFilesAsync，但我不知道如何构建它所需的BatchAnnotateFilesRequest对象，也无法在C#中找到任何示例。

有人能帮我找出如何将PDF文档提取到表格块类型的文本中吗？

提前谢谢。

.net-core

ocr

google-vision

pdf

回答 1

Stack Overflow用户

发布于 2022-05-17 06:16:28

private string ScanPDFWithGoogle(string path)
      {

          string ret = string.Empty;
          try
          {
             
              var image = Google.Cloud.Vision.V1.Image.FromFile(@"C:\Users\ADMIN\Downloads\parts.png");
              Log.Write("In  photoread try catch block : " + image.ToString());
              var credentialPath = ConfigurationManager.AppSettings["GOOGLE_APPLICATION_CREDENTIALS"];
              Log.Write("In  photoread try catch block after credential : " + credentialPath);
              GoogleCredential credential = GoogleCredential.FromFile(credentialPath);
              var channel = new Grpc.Core.Channel(
              ImageAnnotatorClient.DefaultEndpoint.ToString(),
              credential.ToChannelCredentials());
              ImageAnnotatorClient client = ImageAnnotatorClient.Create(channel);
              Log.Write("Channel" + client.ToString());
              var response = client.DetectText(image);
              ret = response.ToString();
              return ret;
          }
          catch (Exception ex)
          {
              Log.Write("Error at photoread api" + ex.Message);
              Log.Write(ex.StackTrace);
              throw ex;
          }


      }

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/69443230

复制

相似问题

问如何在C#中使用提取PDF文本
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何在C#中使用提取PDF文本EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何在C#中使用提取PDF文本
EN