Seneors报告 基于内容的系统日志异常检测-2021年

VIP文档

ID:28547

大小:0.88 MB

页数:16页

时间:2023-01-07

金币:10

上传者:战必胜
sensors
Article
ConAnomaly: Content-Based Anomaly Detection for
System Logs
Dan Lv, Nurbol Luktarhan * and Yiyong Chen

 
Citation: Lv, D.; Luktarhan, N.;
Chen, Y. ConAnomaly:
Content-Based Anomaly Detection
for System Logs. Sensors 2021, 21,
6125. https://doi.org/10.3390/
s21186125
Academic Editors: Hamed Badihi,
Tao Chen and Ningyun Lu
Received: 7 August 2021
Accepted: 7 September 2021
Published: 13 September 2021
Publishers Note: MDPI stays neutral
with regard to jurisdictional claims in
published maps and institutional affil-
iations.
Copyright: © 2021 by the authors.
Licensee MDPI, Basel, Switzerland.
This article is an open access article
distributed under the terms and
conditions of the Creative Commons
Attribution (CC BY) license (https://
creativecommons.org/licenses/by/
4.0/).
College of Information Science and Engineering, Xinjiang University, Urumqi 830046, China;
lvdan@stu.xju.edu.cn (D.L.); chenyiyong578@163.com (Y.C.)
* Correspondence: nurbol@xju.edu.cn
Abstract:
Enterprise systems typically produce a large number of logs to record runtime states
and important events. Log anomaly detection is efficient for business management and system
maintenance. Most existing log-based anomaly detection methods use log parser to get log event
indexes or event templates and then utilize machine learning methods to detect anomalies. However,
these methods cannot handle unknown log types and do not take advantage of the log semantic
information. In this article, we propose ConAnomaly, a log-based anomaly detection model composed
of a log sequence encoder (log2vec) and multi-layer Long Short Term Memory Network (LSTM).
We designed log2vec based on the Word2vec model, which first vectorized the words in the log
content, then deleted the invalid words through part of speech tagging, and finally obtained the
sequence vector by the weighted average method. In this way, ConAnomaly not only captures
semantic information in the log but also leverages log sequential relationships. We evaluate our
proposed approach on two log datasets. Our experimental results show that ConAnomaly has good
stability and can deal with unseen log types to a certain extent, and it provides better performance
than most log-based anomaly detection methods.
Keywords: log anomaly detection; log sequence encoder; LSTM
1. Introduction
With the increase of many people’s needs, the complexity of modern systems is increas-
ing day by day. The more complex the system, the greater the likelihood of vulnerabilities
that an invader may exploit to launch attacks. As a result, anomaly detection has become an
important task in building trusted computer systems [
1
]. An accurate and effective anomaly
detection model can reduce abnormal damage to the system, which is very important for
business management and system maintenance. Logs are widely used to record important
events and system status in operating systems or other software systems. Since system
logs contain noteworthy events and runtime states, they are one of the most valuable data
sources for anomaly detection and system monitoring [2].
Logs are semi-structured text data.One of the important tasks is anomaly detection
in logs [
3
]. It is different from computer vision [
4
6
], digital time series [
7
9
] and graphic
data [10]
. In fact, the traditional way of handling log anomalies is very inefficient.Operators
manually check system logs based on their domain knowledge by matching regular expres-
sions or searching keywords (such as error and Failure). However, this anomaly detection
method is not suitable for large-scale systems.
More and more works start to apply schemes to process the logs automatically. Ex-
isting log-based system anomaly detection methods can be roughly classified into two
categories: one is based on log event indexes, such as PCA [
11
], Invariant
Mining [12]
,
Deeplog [13],
and QLLog [
14
]. The other is based on log templates, such as
LogAnomaly [15]
and LogRobust [
16
]. Although both of these two methods first parse the logs, there are two
differences: one is that the log event index-based method converts the log to the event
index, while the log template-based method removes the numeric information in the log to
Sensors 2021, 21, 6125. https://doi.org/10.3390/s21186125 https://www.mdpi.com/journal/sensors
资源描述:

当前文档最多预览五页,下载文档查看全文

此文档下载收益归作者所有

当前文档最多预览五页,下载文档查看全文
温馨提示:
1. 部分包含数学公式或PPT动画的文件,查看预览时可能会显示错乱或异常,文件下载后无此问题,请放心下载。
2. 本文档由用户上传,版权归属用户,天天文库负责整理代发布。如果您对本文档版权有争议请及时联系客服。
3. 下载前请仔细阅读文档内容,确认文档内容符合您的需求后进行下载,若出现内容与标题不符可向本站投诉处理。
4. 下载文档时可能由于网络波动等原因无法下载或下载错误,付费完成后未能成功下载的用户请联系客服处理。
关闭