我们正在调查是否可能将文档OCR包含到我们的产品中,我们更愿意使用Azure表单识别器。然而,当使用自定义或组合模型进行文档OCR时,我们的性能非常慢--通常超过10秒。这是正常的吗?如果没有,我们如何才能提高业绩。它位于本地区域的S0层上,我们使用的是Azure.AI.FormRecognizer v3.1.1 .NET客户端:
string endpoint = @"https://123.cognitiveservices.azure.com/";
string licenseKey = "123";
var credential = new AzureKeyCredential(licenseKey);
FormRecognizerClient client = new FormRecognizerClient(new Uri(endpoint), credential);
FormRecognizeResponse response = new FormRecognizeResponse()
{
ImageID = imageID,
ImageTypeID = imageTypeID,
ImageTypeName = imageTypeName
};
//https://learn.microsoft.com/en-us/dotnet/api/overview/azure/ai.formrecognizer-readme
//ID Documents sample: https://github.com/Azure/azure-sdk-for-net/blob/Azure.AI.FormRecognizer_3.1.1/sdk/formrecognizer/Azure.AI.FormRecognizer/samples/Sample11_RecognizeIdentityDocuments.md
Stopwatch sw = Stopwatch.StartNew();
if (imageToDetect != null && imageToDetect.Length > 0)
{
//Custom forms:
var options = new RecognizeCustomFormsOptions()
{
IncludeFieldElements = false, //TODO: OK? We not using this in mapping,
//Pages = {"1-3","5-6"}
//ContentType = FormContentType.Jpeg
};
using (var stream = new MemoryStream(imageToDetect))
{
try
{
RecognizeCustomFormsOperation operation = client.StartRecognizeCustomForms(modelID, stream, options);
Response<RecognizedFormCollection> operationResponse = operation.WaitForCompletionAsync().Result;
//RecognizedFormCollection forms = operationResponse.Value;
response = MapToModel(imageID, imageTypeID, imageTypeName, operationResponse, false); //Pass fields and reset in caller
}
catch (RequestFailedException rfEx)
{
response.ErrorResponse = new FormRecognizeError()
{
code = rfEx.ErrorCode,
statusCode = rfEx.Status,
message = rfEx.Message
};
Console.WriteLine($"ERROR: {rfEx.ToString()}");
}
catch (Exception ex)
{
response.ErrorResponse = new FormRecognizeError()
{
statusCode = 400, //(int)HttpStatusCode.BadRequest,
message = ex.Message
};
Console.WriteLine($"ERROR: {ex.ToString()}");
}
}
sw.Stop();
Console.WriteLine("---------------------------------------------------------------------------------");
if(response.RecognizedForms?.Count() > 0)
Console.WriteLine($"{sw.ElapsedMilliseconds} Milliseconds --> DetectForm {imageTypeName} response: RecognizedForms:{response.RecognizedForms?.Count()}. Confidence: {response.RecognizedForms[0].TypeConfidence} Error:{response.ErrorResponse?.message}");
else
Console.WriteLine($"{sw.ElapsedMilliseconds} Milliseconds --> DetectForm {imageTypeName} response: No forms detected. Error:{response.ErrorResponse?.message}");
Console.WriteLine("---------------------------------------------------------------------------------");
if (printDetail)
{
Console.WriteLine(JsonConvert.SerializeObject(response, Formatting.Indented));
Console.WriteLine("---------------------------------------------------------------------------------");
}
Console.WriteLine("");
//Error Logged in caller?
}
else
{
Console.WriteLine($"ERROR: Empty image byte array");
}
发布于 2022-01-12 14:53:21
是的,如果不使用要提取OCR信息的样本来训练表单识别器,这是正常的性能。
发布于 2022-09-10 01:18:57
因此,在今天的网络世界中,当人们期望在3秒内得到响应时,这是否意味着没有人会使用Azure表单识别器来进行任何实时表单处理?就像当我把支票交到银行时,它很快就通过了,一秒钟之内。有什么不同吗?我希望我的表单识别器实现低于4秒,目前,我看到它平均为15秒。它是一个单一的页面表单,大约有160个标签字段使用自定义模型。我必须支持两种格式,所以我计划开始创作模型。有人建议其他形式的处理器吗?
https://stackoverflow.com/questions/70682253
复制相似问题