Discovering Probabilistic Frequent Sequential Patterns in Uncertain Databases under Systolic Tree
|Related article at Pubmed, Scholar Google|
Uncertain data are intrinsic in many real-world applications such as mobile tracking and environment surveillance. Mining sequential patterns from imprecise data, such as those data arising from GPS trajectories and sensor readings are important for discovering hidden knowledge in such applications. We establish two uncertain sequence data models abstracted from many real-life applications involving uncertain sequence data, and formulate the problem of mining probabilistically frequent sequential patterns (or p-FSPs) from data that conform to our models. However, the number of possible worlds is extremely large, which makes the mining prohibitively expensive. Inspired by the famous systolic tree algorithm, we develop patterns that effectively avoids the problem of “possible worlds explosion”, and when combined with our pruning and validating methods, achieves even better performance. We also propose a fast validating method to further speedup by enabling the pattern within the boundary.