1.Joint Recognition of Handwritten Text and Named Entities with a Neural End-to-end Model（联合识别手写文本和命名实体的神经端到端模型）
作者：Manuel Carbonell,Mauricio Villegas,Alicia Fornés,Josep Lladós
机构：Universitat Autonoma de Barcelona
摘要：When extracting information from handwritten documents, text transcription and named entity recognition are usually faced as separate subsequent tasks. This has the disadvantage that errors in the first module affect heavily the performance of the second module. In this work we propose to do both tasks jointly, using a single neural network with a common architecture used for plain text recognition. Experimentally, the work has been tested on a collection of historical marriage records. Results of experiments are presented to show the effect on the performance for different configurations: different ways of encoding the information, doing or not transfer learning and processing at text line or multi-line region level. The results are comparable to state of the art reported in the ICDAR 2017 Information Extraction competition, even though the proposed technique does not use any dictionaries, language modeling or post processing.
2.A Study of Recent Contributions on Information Extraction（最近对信息抽取贡献的研究）
作者：Parisa Naderi Golshan,HosseinAli Rahmani Dashti,Shahrzad Azizi,Leila Safari
机构：University of Zanjan
摘要：This paper reports on modern approaches in Information Extraction (IE) and its two main sub-tasks of Named Entity Recognition (NER) and Relation Extraction (RE). Basic concepts and the most recent approaches in this area are reviewed, which mainly include Machine Learning (ML) based approaches and the more recent trend to Deep Learning (DL) based methods.
3.DeepProbe: Information Directed Sequence Understanding and Chatbot Design via Recurrent Neural Networks（通过递归神经网络对信息进行顺序理解和聊天机器人设计）
作者：Zi Yin,Keng-hao Chang,Ruofei Zhang
摘要：Information extraction and user intention identification are central topics in modern query understanding and recommendation systems. In this paper, we propose DeepProbe, a generic information-directed interaction framework which is built around an attention-based sequence to sequence (seq2seq) recurrent neural network. DeepProbe can rephrase, evaluate, and even actively ask questions, leveraging the generative ability and likelihood estimation made possible by seq2seq models. DeepProbe makes decisions based on a derived uncertainty (entropy) measure conditioned on user inputs, possibly with multiple rounds of interactions. Three applications, namely a rewritter, a relevance scorer and a chatbot for ad recommendation, were built around DeepProbe, with the first two serving as precursory building blocks for the third. We first use the seq2seq model in DeepProbe to rewrite a user query into one of standard query form, which is submitted to an ordinary recommendation system. Secondly, we evaluate DeepProbe's seq2seq model-based relevance scoring. Finally, we build a chatbot prototype capable of making active user interactions, which can ask questions that maximize information gain, allowing for a more efficient user intention idenfication process. We evaluate first two applications by 1) comparing with baselines by BLEU and AUC, and 2) human judge evaluation. Both demonstrate significant improvements compared with current state-of-the-art systems, proving their values as useful tools on their own, and at the same time laying a good foundation for the ongoing chatbot application.
4.Simultaneously Self-Attending to All Mentions for Full-Abstract Biological Relation Extraction（基于同时自注意力对所有提及的全抽象生物关系进行提取）
作者：Patrick Verga,Emma Strubell,Andrew McCallum
机构：College of Information and Computer Sciences
摘要：Most work in relation extraction forms a prediction by looking at a short span of text within a single sentence containing a single entity pair mention. This approach often does not consider interactions across mentions, requires redundant computation for each mention pair, and ignores relationships expressed across sentence boundaries. These problems are exacerbated by the document- (rather than sentence-) level annotation common in biological text. In response, we propose a model which simultaneously predicts relationships between all mention pairs in a document. We form pairwise predictions over entire paper abstracts using an efficient self-attention encoder. All-pairs mention scores allow us to perform multi-instance learning by aggregating over mentions to form entity pair representations. We further adapt to settings without mention-level annotation by jointly training to predict named entities and adding a corpus of weakly labeled data. In experiments on two Biocreative benchmark datasets, we achieve state of the art performance on the Biocreative V Chemical Disease Relation dataset for models without external KB resources. We also introduce a new dataset an order of magnitude larger than existing human-annotated biological information extraction datasets and more accurate than distantly supervised alternatives.
5.Open Information Extraction on Scientific Text: An Evaluation（科学文本的公开信息提取：一次评估）
作者：Paul Groth,Michael Lauruhn,Antony Scerri,Ron Daniel Jr
摘要：Open Information Extraction (OIE) is the task of the unsupervised creation of structured information from text. OIE is often used as a starting point for a number of downstream tasks including knowledge base construction, relation extraction, and question answering. While OIE methods are targeted at being domain independent, they have been evaluated primarily on newspaper, encyclopedic or general web text. In this article, we evaluate the performance of OIE on scientific texts originating from 10 different disciplines. To do so, we use two state-of-the-art OIE systems applying a crowd-sourcing approach. We find that OIE systems perform significantly worse on scientific text than encyclopedic text. We also provide an error analysis and suggest areas of work to reduce errors. Our corpus of sentences and judgments are made available.
原文发布于微信公众号 - 专知（Quan_Zhuanzhi）