MITRE 中文短信中代词丢失恢复2015年(22页)

ID:22807

大小:0.61 MB

页数:22页

时间:2022-11-28

金币:15

上传者:战必胜
Approved for Public Release 15-3236; ©2015-The MITRE Corporation. All rights reserved.
1
Dropped Pronoun Recovery in Chinese SMS
Chris Giannella and Ransom Winder
The MITRE Corporation
7515 Colshire Drive
McLean, VA 22102, USA
{cgiannella,rwinder}@mitre.org
Stacy Petersen
1
Department of Linguistics
Georgetown University
3700 O Street NW
Washington, DC 20057, USA
sjp62@georgetown.edu
Abstract
In written Chinese, personal pronouns are commonly dropped when they can be
inferred from context. This practice is particularly common in informal genres like
Short Message Service (SMS) messages sent via cell phones. Restoring dropped
personal pronouns can be a useful preprocessing step for information extraction.
Dropped personal pronoun recovery can be divided into two subtasks: (1) detecting
dropped personal pronoun slots and (2) determining the identity of the pronoun for
each slot. We address a simpler version of restoring dropped personal pronouns
wherein only the person numbers are identified. After applying a word segmenter, we
used a linear-chain conditional random field (CRF) to predict which words were at the
start of an independent clause. Then, using the independent clause start information,
as well as lexical and syntactic information, we applied a CRF or a maximum-entropy
classifier to predict whether a dropped personal pronoun immediately preceded each
word and, if so, the person number of the dropped pronoun. We conducted a series of
experiments using a manually annotated corpus of Chinese SMS messages. Our
machine-learningbased approaches substantially outperformed a rule-based
approach based partially on rules developed by Chung and Gildea in 2010. Features
derived from parsing did not help our approaches. We conclude that the parse
information is largely superfluous for identifying dropped personal pronouns if
reasonably accurate independent clause start information is available.
1. Introduction
Chinese is commonly characterized as a “pro-drop” language (Baran, Yang, & Nianwen,
2012), (Huang, 1989) since pronouns are commonly dropped when they can be
inferred from context. This practice is particularly common in informal genres like
1
This author’s work was carried while she was a summer intern at the MITRE Corporation.
资源描述:

当前文档最多预览五页,下载文档查看全文

此文档下载收益归作者所有

当前文档最多预览五页,下载文档查看全文
温馨提示:
1. 部分包含数学公式或PPT动画的文件,查看预览时可能会显示错乱或异常,文件下载后无此问题,请放心下载。
2. 本文档由用户上传,版权归属用户,天天文库负责整理代发布。如果您对本文档版权有争议请及时联系客服。
3. 下载前请仔细阅读文档内容,确认文档内容符合您的需求后进行下载,若出现内容与标题不符可向本站投诉处理。
4. 下载文档时可能由于网络波动等原因无法下载或下载错误,付费完成后未能成功下载的用户请联系客服处理。
关闭