This document describes basic outline of classification algorithm in exception subsystem.
Goal of classifier is, based on exception metadata (class, message, stacktrace and cause), to classify exception into category, that describes in what layer/component of application bug happened.
We consider following categories:
These categories are provided as TicketClass enum.
Algorithm uses idea, that most of Java libraries have well-defined namespaces, based on package where classes are placed in. We will use offline database of package names to decide, in which category, we will place bug into.
We use package names of 100 most popular libraries and Wildfly specific packages and its dependencies. Each package name gets labeled with category and weight.
To classify exception we will traverse stacktrace of exception. For each stacktrace element we will try to match package name of class name to package name in packages database and we will match given label and weight to it.
After we traversed all stacktrace elements, we sum weight of every category and category with maximum weight is category of exception.
Since most important parts of stacktrace are usually at top of it, since thats were exception was thrown, we will also put more value to labels at top of stacktrace, so elements, that occur in nearly every stacktrace (like executors and thread handling classes) are nearly ignored if they are not on top of stacktrace.
For efficient search of package, we use trie data structure. Each node of trie tree containts one package token, label and weight of package, that name contains every token on path from root to given node in this order.
Using this approach, we save space for storing every package name.
Package trie Builder
Immutable Package Trie implementation
Testing and adjusting weights
TODO Collect data set of exceptions from stack overflow and JBoss forums.
 Wikipedia Trie