Distributed and scalable sequential pattern mining through stream processing

doi:10.1007/s10115-017-1037-1

Full metadata record

DC Field	Value	Language
dc.contributor.author	Chen, Chun-Chieh	en_US
dc.contributor.author	Shuai, Hong-Han	en_US
dc.contributor.author	Chen, Ming-Syan	en_US
dc.date.accessioned	2018-08-21T05:54:30Z	-
dc.date.available	2018-08-21T05:54:30Z	-
dc.date.issued	2017-11-01	en_US
dc.identifier.issn	0219-1377	en_US
dc.identifier.uri	http://dx.doi.org/10.1007/s10115-017-1037-1	en_US
dc.identifier.uri	http://hdl.handle.net/11536/146046	-
dc.description.abstract	Scalability is a primary issue in existing sequential pattern mining algorithms for dealing with a large amount of data. Previous work, namely sequential pattern mining on the cloud (SPAMC), has already addressed the scalability problem. It supports the MapReduce cloud computing architecture for mining frequent sequential patterns on large datasets. However, this existing algorithm does not address the iterative mining problem, which is the problem that reloading data incur additional costs. Furthermore, it did not study the load balancing problem. To remedy these problems, we devised a powerful sequential pattern mining algorithm, the sequential pattern mining in the cloud-uniform distributed lexical sequence tree algorithm (SPAMC-UDLT), exploiting MapReduce and streaming processes. SPAMC-UDLT dramatically improves overall performance without launching multiple MapReduce rounds and provides perfect load balancing across machines in the cloud. The results show that SPAMC-UDLT can significantly reduce execution time, achieves extremely high scalability, and provides much better load balancing than existing algorithms in the cloud.	en_US
dc.language.iso	en_US	en_US
dc.subject	Sequential pattern mining	en_US
dc.subject	Data mining	en_US
dc.subject	Cloud computing	en_US
dc.subject	MapReduce	en_US
dc.subject	Big data	en_US
dc.subject	Streaming MapReduce	en_US
dc.title	Distributed and scalable sequential pattern mining through stream processing	en_US
dc.type	Article	en_US
dc.identifier.doi	10.1007/s10115-017-1037-1	en_US
dc.identifier.journal	KNOWLEDGE AND INFORMATION SYSTEMS	en_US
dc.citation.volume	53	en_US
dc.citation.spage	365	en_US
dc.citation.epage	390	en_US
dc.contributor.department	電機工程學系	zh_TW
dc.contributor.department	Department of Electrical and Computer Engineering	en_US
dc.identifier.wosnumber	WOS:000409892300003	en_US
Appears in Collections:	Articles