我使用谷歌文档ai中的表单解析器。
当我发送请求时:
curl -X POST -H "Authorization: Bearer "$(gcloud auth application-default print-access-token) -H "Content-Type: application/json; charset=utf-8" -d @request.json https://eu-documentai.googleapis.com/v1beta3/projects/<project id>/locations/eu/processors/<processor id>:process我得到了一个没有实体的文档,结构如下:
{
"document": {
"uri": "",
"mimeType": "application/pdf",
"text": "Pascal Carrié\nincwo SAS\nx2289475d\nc/ santa isabel, 12, 4D\nN° TVA FR33494952401\n28012 Madrid\n16 rue de La Comète\nIban :\nES76 3023 0047 4866 6328 6612\n75007 Paris\nrib:\nBCOEESMM023\nFRANCE\nFactura nº 2020/02\nFecha\n11/2/20\nConcepto\nPrecio\nCuandidad\nIVA\nImporte\ndéveloppement backend\n4250\n1\n0%\n4,250.00 €\nCondicións de pago : A la recepción de la factura\nBase Imponible\n4,250.00 €\nTotal IVA\n0.00 €\nTOTAL\n4,250.00 €\nForma de pago\nContado\n",
"pages": [
{
"pageNumber": 1,
"dimension": {
"width": 2378,
"height": 1681,
"unit": "pixels"
},
"layout": {
"textAnchor": {
"textSegments": [
{
"endIndex": "431"
}
]
},
"boundingPoly": {
"vertices": [
{},
{ ...根本就没有分析。我做错了什么?
当我在演示中上传相同的文档时,它工作正常。
我不认为它与base64相关;我已经按照文档中的描述对我的文档进行了编码并获得了一个字符串
发布于 2021-01-01 21:59:25
您需要遍历pages->formFields对象。在每个对象中,您会发现fieldName和fieldValue,其中包括textAnchor->textSegments startIndex和endIndex;有了这个信息,您就可以子串"text“属性。
php中的基本示例:
$json = json_decode(file_get_contents('d:\tmp\response.json'));
$text = utf8_decode($json->text);
foreach($json->pages as $indx => $pag){
foreach($pag->formFields as $indx2 => $field){
$from = $field->fieldName->textAnchor->textSegments[0]->startIndex;
$to = $field->fieldName->textAnchor->textSegments[0]->endIndex - $from-1;
$name = substr($text, $from, $to);
$from = $field->fieldValue->textAnchor->textSegments[0]->startIndex;
$to = $field->fieldValue->textAnchor->textSegments[0]->endIndex - $from;
$value = substr($text, $from, $to);
$fields[] = [$name => $value];
}
}
print_r($fields);https://stackoverflow.com/questions/65479560
复制相似问题