Normal view MARC view ISBD view

Multivariate clustering techniques- a comparison based on rose (rosa spp.)

By: Arya V Chandran.
Contributor(s): Vijayaraghava Kumar (Guide).
Material type: materialTypeLabelBookPublisher: Vellayani Department of Agricultural Statistics, College of Agriculture 2018Description: 134p.Subject(s): Agricultural StatisticsDDC classification: 630.31 Online resources: Click here to access online Dissertation note: MSc Abstract: The study entitled “Multivariate clustering techniques – a comparison based on rose (Rosa spp.)” was undertaken to compare different clustering techniques, to identify the suitable technique for different types of qualitative and quantitative data and to illustrate the procedures using data based on a field experiment on rose (Rosa spp.). Data on quantitative and qualitative traits collected from a field experiment on “Characterization and genetic improvement in Rose (Rosa spp.) through mutagenesis” done during 2014-2017 at College of Agriculture, Vellayani and Regional Agriculture Research Station (RARS), Ambalavayal, Wayanad was used for the study. Twenty five cultivars each coming under the Hybrid Tea and Floribunda groups of rose were evaluated for the study. There were nine quantitative characters and three qualitative characters. Statistical studies were carried out with the help of statistical packages SPSS, STATA, SAS, R and NTSYS. Preliminary statistical analysis by applying Analysis of variance (ANOVA) for all quantitative characters under study revealed significant difference among different genotypes with respect to each character. Multivariate analysis of variance (MANOVA) was carried out to test the significance of varietal means for each group. The results indicated difference among the cultivar means for both groups with respect to all quantitative characters. Linear discriminant function developed using nine quantitative characters for each of the groups were used to elucidate the differences between them. The average score obtained was 11.01 for the Hybrid Tea type and – 2.34 for Floribunda type with an overall average of 4.38. Discriminant function analysis reassured the difference between the two groups under study. Cluster analysis on Hybrid Tea type and Floribunda type were performed for quantitative, qualitative and mixed data. Association measures used were Euclidean distance, Squared Euclidean, Chebychev distance, City Block distance and Mahalanobis D2 for quantitative data, Jaccard, Dice, Simple matching and Hamann’s coefficient for qualitative data and Gower’s measure for mixed data. Different methods such as single linkage, complete linkage, Unweighted Pair Group Average Method (UPGMA), Weighted Pair Group Average Method (WPGMA), Unweighted Pair Group Centroid Method (UPGMC), Ward’s method, modified Tocher method, k means clustering and Principal Component Analysis (PCA) were adopted for the clustering of cultivars. Optimum numbers of clusters were determined by Pseudo t2 statistics for hierarchical clustering and by Pesudo F statistics for k means clustering. SD ( Scatterness- Distance) index was used to test validity of clustering based on quantitative data. Clustering based on qualitative data was carried out using seven characters, three of which are qualitative traits and all others are quantitative characters converted to qualitative traits. Jaccard and Dice coefficient were used for binary data while Simple matching and Hamann’s were used for multi-state data. The result of different clustering techniques based on Squared Euclidean distance gave approximately the same result as that of Euclidean distance. The Jaccard and Dice coefficients were found to be very similar, so that there was no difference in topology of dendrogram but only in branch length. Clustering pattern under Simple matching and Hamann’s coefficient provided were of similar type. For both groups among all the clustering methods, single linkage clustering under different distance measures tends to create a set of one or two clusters including majority of the genotypes and the remaining genotypes are single or two member clusters. Single linkage clustering tends to produce long chain types clusters as opposed to bunched clusters. On the other hand, the single linkage algorithm suffers chaining effect. Among other clustering algorithms, complete linkage method and Ward’s clustering method showed similar results under Squared Euclidean distance. UPGMA, WPGMA and UPGMC methods under Squared Euclidean method gave comparable results. Clustering using UPGMA and WPGMA method gives almost same clustering pattern under different distance measures for qualitative and quantitative data. Results obtained from k means clustering are comparable with results obtained from hierarchical clustering except for single linkage clustering. A certain degree of similarity was observed between k means and D2 analysis but not to up that between other clustering methods. Under Hybrid Tea genotypes, H16 (Mary Jean) formed a single cluster under single linkage method using different distance measures for quantitative, qualitative and mixed data analysis. Under complete linkage method H7 (Alaine Souchen) and H25 (Josepha) came under same cluster, in clustering based on quantitative and qualitative characters. H22 (Mom’s Rose) and H23 (Lois Wilson) came under same cluster in clustering based on complete linkage, UPGMA and WPGMA except under Hamann’s coefficient. These came under the same cluster under D2 analysis also. Among Floribunda genotypes F2 (Tickled Pink) and F5 (Princess de Monaco) were included in the same cluster under UPGMA method for both quantitative and qualitative data. F1 (Versailles) and F24 (Golden Fairy) also came under the same cluster except for multistage distances under UPGMA. Clustering based on mixed data gave approximately the same results as that of quantitative data under different clustering algorithms except for single linkage clustering. Comparison using SD index indicated high index value for clustering based on Gower’s measure. Comparison among single linkage, complete linkage and Average linkage under different association measures using SD index were carried out. Average linkage method under Squared Euclidean was found to be the best for both type with SD index 0.651 for Hybrid Tea and 0.659 for Floribunda type. Clustering pattern observed from score plot of PCA is comparable with the pattern obtained from quantitative data especially with D2 analysis. Contribution of characters towards variance obtained D2 analysis and PCA showed similar results. From the study it is possible to compare different methods and exclude inappropriate methods. Groups formed from modified Tocher method and PCA are different from other methods. SD index indicated that UPGMA under Squared Euclidean distance is the best for quantitative data.
Tags from this library: No tags from this library for this title. Log in to add tags.
    average rating: 0.0 (0 votes)
Item type Current location Collection Call number Status Date due Barcode
Theses Theses KAU Central Library, Thrissur
Theses
Reference Book 630.31 ARY/MU (Browse shelf) Not For Loan 174485

MSc

The study entitled “Multivariate clustering techniques – a comparison based on rose
(Rosa spp.)” was undertaken to compare different clustering techniques, to identify the
suitable technique for different types of qualitative and quantitative data and to illustrate the
procedures using data based on a field experiment on rose (Rosa spp.). Data on quantitative
and qualitative traits collected from a field experiment on “Characterization and genetic
improvement in Rose (Rosa spp.) through mutagenesis” done during 2014-2017 at College of
Agriculture, Vellayani and Regional Agriculture Research Station (RARS), Ambalavayal,
Wayanad was used for the study. Twenty five cultivars each coming under the Hybrid Tea
and Floribunda groups of rose were evaluated for the study. There were nine quantitative
characters and three qualitative characters. Statistical studies were carried out with the help of
statistical packages SPSS, STATA, SAS, R and NTSYS.
Preliminary statistical analysis by applying Analysis of variance (ANOVA) for all
quantitative characters under study revealed significant difference among different genotypes
with respect to each character. Multivariate analysis of variance (MANOVA) was carried out
to test the significance of varietal means for each group. The results indicated difference
among the cultivar means for both groups with respect to all quantitative characters.
Linear discriminant function developed using nine quantitative characters for each of
the groups were used to elucidate the differences between them. The average score obtained
was 11.01 for the Hybrid Tea type and – 2.34 for Floribunda type with an overall average of
4.38. Discriminant function analysis reassured the difference between the two groups under
study.
Cluster analysis on Hybrid Tea type and Floribunda type were performed for
quantitative, qualitative and mixed data. Association measures used were Euclidean distance,
Squared Euclidean, Chebychev distance, City Block distance and Mahalanobis D2 for
quantitative data, Jaccard, Dice, Simple matching and Hamann’s coefficient for qualitative
data and Gower’s measure for mixed data. Different
methods such as single linkage,
complete linkage, Unweighted Pair Group Average Method (UPGMA), Weighted Pair Group
Average Method (WPGMA), Unweighted Pair Group Centroid Method (UPGMC), Ward’s
method, modified Tocher method, k means clustering and Principal Component Analysis
(PCA) were adopted for the clustering of cultivars. Optimum numbers of clusters were
determined by Pseudo t2 statistics for hierarchical clustering and by Pesudo F statistics for k
means clustering. SD ( Scatterness- Distance) index was used to test validity of clustering
based on quantitative data.
Clustering based on qualitative data was carried out using seven characters, three of
which are qualitative traits and all others are quantitative characters converted to qualitative
traits. Jaccard and Dice coefficient were used for binary data while Simple matching and
Hamann’s were used for multi-state data. The result of different clustering techniques based
on Squared Euclidean distance gave approximately the same result as that of Euclidean
distance. The Jaccard and Dice coefficients were found to be very similar, so that there was
no difference in topology of dendrogram but only in branch length. Clustering pattern under
Simple matching and Hamann’s coefficient provided were of similar type.
For both groups among all the clustering methods, single linkage clustering under
different distance measures tends to create a set of one or two clusters including majority of
the genotypes and the remaining genotypes are single or two member clusters. Single linkage
clustering tends to produce long chain types clusters as opposed to bunched clusters. On the
other hand, the single linkage algorithm suffers chaining effect. Among other clustering
algorithms, complete linkage method and Ward’s clustering method showed similar results
under Squared Euclidean distance. UPGMA, WPGMA and UPGMC methods under Squared
Euclidean method gave comparable results. Clustering using UPGMA and WPGMA method
gives almost same clustering pattern under different distance measures for qualitative and
quantitative data. Results obtained from k means clustering are comparable with results
obtained from hierarchical clustering except for single linkage clustering. A certain degree of
similarity was observed between k means and D2 analysis but not to up that between other
clustering methods.
Under Hybrid Tea genotypes, H16 (Mary Jean) formed a single cluster under single
linkage method using different distance measures for quantitative, qualitative and mixed data
analysis. Under complete linkage method H7 (Alaine Souchen) and H25 (Josepha) came
under same cluster, in clustering based on quantitative and qualitative characters. H22
(Mom’s Rose) and H23 (Lois Wilson) came under same cluster in clustering based on
complete linkage, UPGMA and WPGMA except under Hamann’s coefficient. These came
under the same cluster under D2 analysis also. Among Floribunda genotypes F2 (Tickled
Pink) and F5 (Princess de Monaco) were included in the same cluster under UPGMA method
for both quantitative and qualitative data. F1 (Versailles) and F24 (Golden Fairy) also came
under the same cluster except for multistage distances under UPGMA.
Clustering based on mixed data gave approximately the same results as that of
quantitative data under different clustering algorithms except for single linkage clustering.
Comparison using SD index indicated high index value for clustering based on Gower’s
measure.
Comparison among single linkage, complete linkage and Average linkage under
different association measures using SD index were carried out. Average linkage method
under Squared Euclidean was found to be the best for both type with SD index 0.651 for
Hybrid Tea and 0.659 for Floribunda type.
Clustering pattern observed from score plot of PCA is comparable with the pattern
obtained from quantitative data especially with D2 analysis. Contribution of characters
towards variance obtained D2 analysis and PCA showed similar results.
From the study it is possible to compare different methods and exclude inappropriate
methods. Groups formed from modified Tocher method and PCA are different from other
methods. SD index indicated that UPGMA under Squared Euclidean distance is the best for
quantitative data.

There are no comments for this item.

Log in to your account to post a comment.
Kerala Agricultural University Central Library
Thrissur-(Dt.), Kerala Pin:- 680656, India
Ph : (+91)(487) 2372219
E-mail: librarian@kau.in
Website: http://library.kau.in/