## The Complexity of Mining Maximal Frequent Itemsets and Maximal Frequent Patterns

*Guizhen Yang*
### Abstract

Mining maximal frequent itemsets is one of the most fundamental
problems in data mining. In this paper we study the
complexity-theoretic aspects of maximal frequent itemset mining, from
the perspective of counting the number of solutions. We present the
first formal proof that the problem of counting the number of distinct
maximal frequent itemsets in a database of transactions, given an
arbitrary support threshold, is #P-complete, thereby providing
strong theoretical evidence that the problem of mining maximal
frequent itemsets is NP-hard. This result is of particular interest
since the associated decision problem of checking the existence of a
maximal frequent itemset is in P.

We also extend our complexity analysis to other similar data mining
problems dealing with complex data structures, such as sequences,
trees, and graphs, which have attracted intensive research interests
in recent years. Normally, in these problems a partial order among
frequent patterns can be defined in such a way as to preserve the
downward closure property, with maximal frequent patterns being those
without any successor with respect to this partial order. We
investigate several variants of these mining problems in which the
patterns of interest are subsequences, subtrees, or subgraphs, and
show that the associated problems of counting the number of maximal
frequent patterns are all either #P-complete or #P-hard.