完整后设资料纪录
DC 栏位语言
dc.contributor.author陈勇成en_US
dc.contributor.authorYeong-Cheng Chenen_US
dc.contributor.author张志永en_US
dc.contributor.authorJyh-Yeong Changen_US
dc.date.accessioned2014-12-12T01:40:54Z-
dc.date.available2014-12-12T01:40:54Z-
dc.date.issued2003en_US
dc.identifier.urihttp://140.113.39.130/cdrfb3/record/nctu/#GT009112561en_US
dc.identifier.urihttp://hdl.handle.net/11536/45168-
dc.description.abstract在过去,Nearest Neighbor演算法通常都是用来处理资料属性全部都是数值的例子。在这样的属性当中,这些事例都是被视为点,而且彼此之间的距离都适用标准的定义(如欧几里得距离为基准)。而在符号的领域当中,我们通常需要对特征向量空间做更复杂的处理;处理符号属性空间的Nearest Neighbor演算法则,是利用特定的距离表,产生事例之间彼此的实值距离,而且指派一些权重在某些有效或可靠事例,以进一步修正特征空间中的架构。

此篇论文,我们在符号领域中,有效的提出一种典型符号的学习方式,这种典型可以藉由最小距离分类器,学习处理关于符号属性的问题、属性的权重、以及在每一种类别当中找到一个典型符号,如此我们都可以由符号性质最近均值分类器(symbolic nearest mean classifier)进行分类。

除了上述之每一种类中当学到典型符号的方法,另外,我们可以把在同一类别当中所有典型的分量均予考虑,这样我们就可以在同一个类别当中,设计出一个模糊式的典型符号,我们再由模糊典型符号之最近均值分类器(symbolic nearest mean classifier with fuzzy prototype)进行分类。

我们使用上述演算法,处理机器学习领域中的三个(其中两个为生物资讯)问题:镜片辨识、辨识Promoter的基因序列及计算Splice的接面,皆呈现极佳的分类准确率。藉由不同的测试评估方法,和其他的学习演算法做比较,我们的演算法在那三个所要测试的资料领域中,都是胜过其他演算法或是可与其匹敌的;除此之外,我们的演算法具有训练简单及速度快的优点。最后,模拟实验结果可以证明Nearest-Neighbor演算法及相关的延续发展在处理符号属性资料的辨识是具优势的。
zh_TW
dc.description.abstractIn the past, nearest neighbor algorithms for learning from examples have worked very well in domains in which all features had numeric values. In such domains, the examples can be treated as points and distance metricscan be exploited using standard definitions, such as Euclidean distance. In symbolic domains, a more sophisticated treatment of the feature space is required. The nearest neighbor algorithm used for the symbolic feature space calculates distance tables that allow it to produce real-valued distances between instances, and attaches weight to the instances to further modify the structure of feature space.

In this thesis, we present an empirical analysis of symbolic prototype learners for discrete domains. Our symbolic prototype learner is derived from modifying the minimum distance classifier to solve problems with symbolic attributes and attribute weighting, and learns a prototype to each class. And then the classification is implemented in symbolic nearest mean classifier.

In addition to a prototype to each class, we can consider the contributions of the component prototypes for all samples in each class. Then we can design a fuzzy prototype approach and implement the symbolic nearest mean by fuzzy prototype setting.

We validate our proposed algorithms and on three data sets, majority of them are bioinformatics problems; that have been studied by machine learning researchers, such as Lenses recognition, identifying DNA promoter sequences, and Splice-junction determination. From experimentalcomparisons with the other learning algorithms, our simulation result has shown that our proposed algorithms are superior or comparable in the classification accuracy. In addition, our algorithms have advantages in training speed, simplicity, and perspicuity. Experimental evidence has demonstrated the promising sign to continue development of nearest neighbor algorithms for symbolic data domains.
en_US
dc.language.isozh_TWen_US
dc.subject符号属性zh_TW
dc.subject典型符号zh_TW
dc.subject最近均值分类器zh_TW
dc.subject生物资讯zh_TW
dc.subject机器学习zh_TW
dc.subject模糊zh_TW
dc.subjectk-NNen_US
dc.subjectSNMen_US
dc.subjectFSNMen_US
dc.subjectPromoteren_US
dc.subjectSpliceen_US
dc.subjectcross-validationen_US
dc.title利用Nearest Neighbor演算法处理符号性质资料的分类及其于生物资讯的应用zh_TW
dc.titleNearest Neighbor Algorithm for Symbolic Data Set Classification and Its Application in Bioinformaticsen_US
dc.typeThesisen_US
dc.contributor.department电控工程研究所zh_TW
显示于类别:Thesis


文件中的档案:

  1. 256101.pdf

If it is a zip file, please download the file and unzip it, then open index.html in a browser to view the full text content.