xLoad

Explainable Valuation of Log Data for Deep Learning Based Anomaly Detection

Figure 1. End-to-end xLoad workflow.

Figure 2. Aggregation of SHAP values for template-level relevance.

This repository provides the end-to-end pipeline to:

parse raw logs into templates,
train DL-based log anomaly detectors,
explain model inputs with SHAP,
rank template relevance,
distill (purge) low-relevance logs,
re-train and evaluate the resulting models.

Artifact Notes

The open dataset (HDFS) can be downloaded from LogHub.
The proprietary production dataset (Raptor) is not publicly released.

Why xLoad Is Stronger Than Random Truncation

Keeps detection quality with less data
Across representative DL models and two large-scale datasets, up to about 30% low-relevance logs can be removed while keeping anomaly detection metrics (Accuracy / Recall / F1) largely stable.
Clearly better than random truncation
Under the same truncation ratio, relevance-based truncation (xLoad) preserves F1 and Recall better than random truncation.
The paper reports that random truncation causes a clearer metric decline, while xLoad remains more robust.
Substantial training-time reduction
With around 30% truncation, training-time reduction is often around 50% (model- and dataset-dependent).
This benefit is shown consistently in the training-time table.
Good temporal durability
Models trained on distilled logs can stay effective over later time slices, reducing re-training frequency in practical settings.

Figure 3. HDFS template relevance patterns across models.

Figure 4. Raptor top/bottom relevance ranking comparison.

Metrics

Performance curves under truncation (direct xLoad vs. random evidence)

Figure 5. HDFS / DeepLog.	Figure 6. HDFS / CNN.	Figure 7. HDFS / LogRobust.	Figure 8. HDFS / Logsy.
Figure 9. Raptor / DeepLog.	Figure 10. Raptor / CNN.	Figure 11. Raptor / LogRobust.	Figure 12. Raptor / Logsy.
Figure 13. Legend used in the performance metric figures.

Durability curves over time (model stability after distillation)

Figure 14. Durability (30% truncation) / DeepLog.	Figure 15. Durability (30% truncation) / CNN.	Figure 16. Durability (30% truncation) / LogRobust.	Figure 17. Durability (30% truncation) / Logsy.
Figure 18. Durability (20% truncation) / DeepLog.	Figure 19. Durability (20% truncation) / CNN.	Figure 20. Durability (20% truncation) / LogRobust.	Figure 21. Durability (20% truncation) / Logsy.
Figure 22. Legend used in the durability figures.

Project Structure

.
|-- README.md
|-- conf
|   |-- config.yaml        # Main xLoad settings
|   |-- drain3.ini         # Template mining settings (HDFS example)
|   `-- log.yaml           # Logging config
|-- epurger
|   |-- __main__.py        # CLI entry
|   |-- parser.py          # Template mining / parsing
|   |-- preprocessor.py    # Windowing, feature preparation, train/test split
|   |-- trainer.py         # Model training / evaluation
|   |-- explainer.py       # SHAP explanation
|   |-- purger.py          # Relevance-based log purging
|   |-- figure.py          # Figure generation utilities
|   `-- models             # DeepLog, LogRobust, CNN, Logsy implementations
`-- requirements.txt

Environment

Python 3.10 is recommended.
Conda environment is recommended.
While pure CPU execution is supported, NVIDIA GPUs with CUDA is optional but strongly recommended for optimal performance.
Runs on Windows, macOS, and Linux.

Install dependencies:

pip install -r requirements.txt

For LogRobust, download and place FastText vectors in data/pretrain:

wiki-news-300d-1M.vec.zip

Note: LogRobust can require very large RAM (about 50 GB free RAM on HDFS in our practice).

Reproduction Workflow (HDFS)

Example in HDFS dataset

Parse(par): Get structured logs

First, you need to download HDFS.log from LogHub, which is about 1.5GB.

python -m epurger par -s HDFS -i data/HDFS.log

Key parameters:

-s, --dataset: dataset type (HDFS or Raptor).
-i, --input: raw log file/directory path.
-o, --output: optional output directory for parsed logs. If omitted, default is <input>-par.

Preprocess(pre): Preprocess data and split train/test dataset

You need anomaly_label.csv (also from LogHub) to build labeled structured data.

python -m epurger pre -s HDFS -f data/HDFS.log-par/HDFS_structured.csv -l data/anomaly_label.csv

After this step, you will get a preprocessed dataset such as: data/HDFS.log-par/HDFS_structured_test_0.2_anomaly_0.0_0/

Key parameters:

-f, --file: structured log csv generated by par.
-l, --label: anomaly label file.
-tr, --test-ratio: test split ratio (default 0.2).
-a, --anomaly-ratio: anomaly ratio in training set (default 0.0).
-ta, --testAnomalyRatio: anomaly ratio in test set (default 1.0).
-t, --template: optional template file for filtering/preprocess.
--threshold: template filtering threshold.

Train(tra) and Explain(exp/exps): Train and explain models

When -t (template path) is provided in tras, trained models are automatically explained. SHAP value generation itself does not require template file, but generating ranking files does.

python -m epurger tras \
  -m CNN_Logsy_DeepLog_RobustLog \
  -d data/HDFS.log-par/HDFS_structured_test_0.2_anomaly_0.0_0/ \
  -t data/HDFS.log-par/HDFS_templates.csv

Outputs:

models: workspace/models/HDFS.log-par/HDFS_structured_test_0.2_anomaly_0.0_0/
SHAP results: workspace/shap/HDFS.log-par/HDFS_structured_test_0.2_anomaly_0.0_0/

Key parameters:

-m, --model: model list joined by _, for example CNN_Logsy_DeepLog_RobustLog.
-d, --data: preprocessed dataset directory.
-e, --epochs: training epochs (default follows internal config).
-t, --template: template file path; used for ranking generation after explanation.
-l, --limit: max data size for explanation stage.

Purge(prg): Purge logs then train and evaluate models

Because paths differ across environments, here is a practical multi-model example. Prepare a copy of anomaly_label.csv in data/HDFS.log-par/ before purging.

nohup python -m epurger prg \
  -f data/HDFS.log-par/HDFS_structured.csv \
  -r workspace/shap/HDFS.log-par/HDFS_structured_test_0.2_anomaly_0.0_0/ab852d2e-CNN-16384-20230724-175732,8ad0b317-Logsy-16384-20230724-184401,8a5adec2-DeepLog-16384-20230724-175216 \
  -m workspace/models/HDFS.log-par/HDFS_structured_test_0.2_anomaly_0.0_0/ab852d2e-CNN,8ad0b317-Logsy,8a5adec2-DeepLog \
  --dataset HDFS \
  -b 5 \
  -s 30 \
  --recursive 1 \
  -t data/HDFS.log-par/HDFS_templates.csv \
  > nohup.out &

This command purges logs and retrains/evaluates model pairs at purge ratios 5%, 10%, 15%, 20%, 25%, 30%.

Key parameters:

-f, --file: structured log file to purge.
-r, --ranking: ranking result directory (single) or comma-separated directories (multiple).
-m, --model: corresponding base model directory (single) or comma-separated directories.
-s, --separator: purge separator value. With default percentage mode, it is treated as a purge ratio (%).
-b, --bottom: lower bound separator in recursive mode.
--recursive: enable recursive purge from bottom to separator.
-t, --template: template file used for preprocessing purged data.
--dataset: dataset type (HDFS/Raptor).
--percentage: whether separator is percentage (default True).
--reversed: reverse purge direction.
--both: execute both directions.
-g, --grouped: enable grouped random purge policy.

Evaluation(evl): Evaluate models

You can evaluate a specified model on a given dataset as follows:

python -m epurger evl \
  -m workspace/models/HDFS.log-par/HDFS_structured_test_0.2_anomaly_0.0_0/31b7ce21-Logsy \
  -d data/HDFS.log-par/HDFS_structured_test_0.2_anomaly_0.0_0/

Key parameters:

-m, --model: model directory to evaluate.
-d, --data: dataset directory for evaluation.
-k, --topk: top-k for next-event based evaluation logic.
--directory: if set, evaluate a directory of models instead of a single model.

Command Line Manual Examples

epurger is used to explain log-based anomaly detection models and mine feature significance.

Main subcommands:

parse (par)
preprocess (pre)
train (tra) / trains (tras)
explain (exp) / explains (exps)
rank (rnk)
purge (prg)
purgeDraw (prgD) / purgeDraws (prgDs)
evaluate (evl)

Use the command below for complete options and latest argument details:

python -m epurger -h

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
conf		conf
docs/figures/pdfsrc_all		docs/figures/pdfsrc_all
epurger		epurger
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

xLoad

Artifact Notes

Why xLoad Is Stronger Than Random Truncation

Metrics

Performance curves under truncation (direct xLoad vs. random evidence)

Durability curves over time (model stability after distillation)

Project Structure

Environment

Reproduction Workflow (HDFS)

Example in HDFS dataset

Parse(par): Get structured logs

Preprocess(pre): Preprocess data and split train/test dataset

Train(tra) and Explain(exp/exps): Train and explain models

Purge(prg): Purge logs then train and evaluate models

Evaluation(evl): Evaluate models

Command Line Manual Examples

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

xLoad

Artifact Notes

Why xLoad Is Stronger Than Random Truncation

Metrics

Performance curves under truncation (direct xLoad vs. random evidence)

Durability curves over time (model stability after distillation)

Project Structure

Environment

Reproduction Workflow (HDFS)

Example in HDFS dataset

Parse(par): Get structured logs

Preprocess(pre): Preprocess data and split train/test dataset

Train(tra) and Explain(exp/exps): Train and explain models

Purge(prg): Purge logs then train and evaluate models

Evaluation(evl): Evaluate models

Command Line Manual Examples

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages