Web文本信息的开放式关系提取系统

ID：38924

阅读量：1

大小：0.58 MB

页数：19页

时间：2023-03-14

金币：2

上传者：战必胜

Citation: Li, H.; Liu, B. An Open

Relation Extraction System for Web

Text Information. Appl. Sci. 2022, 12,

5718. https://doi.org/10.3390/

app12115718

Academic Editors: Katia Lida

Kermanidis, Phivos Mylonas

and Manolis Maragoudakis

Received: 6 May 2022

Accepted: 31 May 2022

Published: 4 June 2022

Publisher’s Note: MDPI stays neutral

with regard to jurisdictional claims in

published maps and institutional afﬁl-

iations.

Licensee MDPI, Basel, Switzerland.

This article is an open access article

distributed under the terms and

conditions of the Creative Commons

Attribution (CC BY) license (https://

creativecommons.org/licenses/by/

4.0/).

applied

sciences

Article

An Open Relation Extraction System for Web Text Information

Huagang Li and Bo Liu *

College of Computer Science and Technology, National University of Defense Technology,

Changsha 410073, China; lihuagang21@163.com

* Correspondence: kyle.liu@nudt.edu.cn

Abstract:

Web texts typically undergo the open-ended growth of new relations. Traditional relation

extraction methods lack automatic annotation and perform poorly on new relation extraction tasks.

We propose an open-domain relation extraction system (ORES) based on distant supervision and

few-shot learning to solve this problem. More speciﬁcally, we utilize tBERT to design instance selector

1, implementing automatic labeling in the data mining component. Meanwhile, we design example

selector 2 based on K-BERT in the new relation extraction component. The real-time data management

component outputs new relational data. Experiments show that ORES can ﬁlter out higher quality

and diverse instances for better new relation learning. It achieves signiﬁcant improvement compared

to Neural Snowball with fewer seed sentences.

Keywords: open relation extraction; few-shot learning; knowledge extraction; tBERT; K-BERT

1. Introduction

Information and knowledge are the basis for the development of human society. Text

records 80 (https://breakthroughanalysis.com/2008/08/01/unstructured-data-and-the-

80-percent-rule/, accessed on 5 May 2022) percent of the information of human civilization.

The core task of information extraction (IE) is to obtain structured triples from unstruc-

tured text. It relies on two fundamental tasks: entity recognition and relation extraction.

Li et al. [1]

proposed an entity recognition method that performs well. For relation ex-

traction, new relation prediction is a challenge. Traditional relation extraction mainly

adopts supervised learning methods for predeﬁned relations. Its essence is to transform

relation extraction into relation classiﬁcation. There are two paradigms: pipeline relation

extraction [

] and joint relation extraction [

]. Traditional RE performs well but faces two

challenges. The ﬁrst challenge is that predeﬁned relation classiﬁcations do not work well on

new relation extraction tasks. The second challenge is that relational data relies too much

on manual cleaning and labeling, which is costly. In addition, for large-scale knowledge

bases such as Wikidata, manual annotation would be challenging to accomplish.

To solve this problem, Banko [

] ﬁrst proposed the concept of open information extrac-

tion. That is, extracting structured relational facts from open and growing unstructured

text. Information extraction should not be limited to a small set of known relations. RE

should be able to extract a wide variety of relations in a text. The scope of its research is that

the entity pair of the relation is known, and the relationship type between the entity pair

is unlimited. Open-domain relation extraction should meet three academic requirements:

automation, non-homologous corpus, and high efﬁciency.

• Automation

The open relation extraction system can execute automatically, and the algorithm only

needs to go through the corpus once for triple tuples extraction. It should be based on

an unsupervised extraction strategy and cannot be a predeﬁned relation. In addition,

the cost of manually constructing training samples is small, and only a tiny number of

initialization seeds need to be labeled or a small number of extraction templates need

to be deﬁned.

Appl. Sci. 2022, 12, 5718. https://doi.org/10.3390/app12115718 https://www.mdpi.com/journal/applsci

资源描述：

当前文档最多预览五页，下载文档查看全文

侵权申诉



1 1 2 3 4 5 / 19



此文档下载收益归作者所有

当前文档最多预览五页，下载文档查看全文

版权提示

温馨提示：
1. 部分包含数学公式或PPT动画的文件，查看预览时可能会显示错乱或异常，文件下载后无此问题，请放心下载。
2. 本文档由用户上传，版权归属用户，天天文库负责整理代发布。如果您对本文档版权有争议请及时联系客服。
3. 下载前请仔细阅读文档内容，确认文档内容符合您的需求后进行下载，若出现内容与标题不符可向本站投诉处理。
4. 下载文档时可能由于网络波动等原因无法下载或下载错误，付费完成后未能成功下载的用户请联系客服处理。

大家都在看

近期热门

Web文本信息的开放式关系提取系统

最近更新

大家都在看

相关文章

相关标签