我想使用谷歌视觉,以提取PDF到文本/表格。我的PDF包含一个我想要提取的表(BlockType
= table)。
不过,我不知道如何在C#中做到这一点。
我安装了Google.Cloud.Vision.API
NuGet并尝试使用DetectTextDocument
方法,但它似乎只接收图像。
var client = new ImageAnnotatorClientBuilder
{
CredentialsPath = @"myjsonfile.json"
}.Build();
Image image = Image.FromUri("https://storage.cloud.google.com/pathtomyfile.pdf");
TextAnnotation response = client.DetectDocumentText(image); // Getting error for a bad image.
然后,我尝试查找任何文件方法并找到BatchAnnotateFilesAsync
,但我不知道如何构建它所需的BatchAnnotateFilesRequest
对象,也无法在C#
中找到任何示例。
有人能帮我找出如何将PDF文档提取到表格块类型的文本中吗?
提前谢谢。
发布于 2022-05-17 06:16:28
private string ScanPDFWithGoogle(string path)
{
string ret = string.Empty;
try
{
var image = Google.Cloud.Vision.V1.Image.FromFile(@"C:\Users\ADMIN\Downloads\parts.png");
Log.Write("In photoread try catch block : " + image.ToString());
var credentialPath = ConfigurationManager.AppSettings["GOOGLE_APPLICATION_CREDENTIALS"];
Log.Write("In photoread try catch block after credential : " + credentialPath);
GoogleCredential credential = GoogleCredential.FromFile(credentialPath);
var channel = new Grpc.Core.Channel(
ImageAnnotatorClient.DefaultEndpoint.ToString(),
credential.ToChannelCredentials());
ImageAnnotatorClient client = ImageAnnotatorClient.Create(channel);
Log.Write("Channel" + client.ToString());
var response = client.DetectText(image);
ret = response.ToString();
return ret;
}
catch (Exception ex)
{
Log.Write("Error at photoread api" + ex.Message);
Log.Write(ex.StackTrace);
throw ex;
}
}
https://stackoverflow.com/questions/69443230
复制相似问题