Search Results/Filters    

Filters

Year

Banks



Expert Group










Full-Text


Issue Info: 
  • Year: 

    2022
  • Volume: 

    52
  • Issue: 

    3
  • Pages: 

    205-215
Measures: 
  • Citations: 

    0
  • Views: 

    135
  • Downloads: 

    23
Abstract: 

Distance-based Clustering methods categorize samples by optimizing a global criterion, finding ellipsoid clusters with roughly equal sizes. In contrast, density-based Clustering techniques form clusters with arbitrary shapes and sizes by optimizing a local criterion. Most of these methods have several hyper-parameters, and their performance is highly dependent on the hyper-parameter setup. Recently, a Gaussian Density Distance (GDD) approach was proposed to optimize local criteria in terms of distance and density properties of samples. GDD can find clusters with different shapes and sizes without any free parameters. However, it may fail to discover the appropriate clusters due to the interfering of clustered samples in estimating the density and distance properties of remaining unclustered samples. Here, we introduce Adaptive GDD (AGDD), which eliminates the inappropriate effect of clustered samples by adaptively updating the parameters during Clustering. It is stable and can identify clusters with various shapes, sizes, and densities without adding extra parameters. The distance metrics calculating the dissimilarity between samples can affect the Clustering performance. The effect of different distance measurements is also analyzed on the method. The experimental results conducted on several well-known datasets show the effectiveness of the proposed AGDD method compared to the other well-known Clustering methods.

Yearly Impact: مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

View 135

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesDownload 23 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesCitation 0 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesRefrence 0
Issue Info: 
  • Year: 

    2022
  • Volume: 

    52
  • Issue: 

    4
  • Pages: 

    281-291
Measures: 
  • Citations: 

    0
  • Views: 

    151
  • Downloads: 

    18
Abstract: 

Automatic topic detection seems unavoidable in social media analysis due to big text data which their users generate. Clustering-based methods are one of the most important and up-to-date categories in topic detection. The goal of this research is to have a wide study on this category. Therefore, this paper aims to study the main components of Clustering-based-topic-detection, which are embedding methods, distance metrics, and Clustering algorithms. Transfer learning and consequently pretrained language models and word embeddings have been considered in recent years. Regarding the importance of embedding methods, the efficiency of five new embedding methods, from earlier to recent ones, are compared in this paper. To conduct our study, two commonly used distance metrics, in addition to five important Clustering algorithms in the field of topic detection, are implemented by the authors. As COVID-19 has turned into a hot trending topic on social networks in recent years, a dataset including one-month tweets collected with COVID-19-related hashtags is used for this study. More than 7500 experiments are performed to determine tunable parameters. Then all combinations of embedding methods, distance metrics and Clustering algorithms (50 combinations) are evaluated using Silhouette metric. Results show that T5 strongly outperforms other embedding methods, cosine distance is weakly better than other distance metrics, and DBSCAN is superior to other Clustering algorithms.

Yearly Impact: مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

View 151

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesDownload 18 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesCitation 0 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesRefrence 0
Issue Info: 
  • Year: 

    2022
  • Volume: 

    7
  • Issue: 

    2
  • Pages: 

    1-22
Measures: 
  • Citations: 

    0
  • Views: 

    156
  • Downloads: 

    15
Abstract: 

Purpose: Clustering and co-word analysis is a method to reveal relationships and links and illustrate the intellectual structure of a scientific field. This research tries to study the intellectual structure of articles in the field of futures studies in Iran by using the technique of co-word analysis. Method: The current research is a descriptive-analytical development with a scientometric approach. The statistical population is 921 articles retrieved records in the field of futures studies. Findings: The findings showed that articles in the field of futures studies in Iran are often associated with positive growth, and in terms of frequency, the keywords scenario, Islamic Republic, and foresight are the most frequent in futures studies. The findings related to the hierarchical Clustering led to the formation of 8 clusters in this field, namely "ICT visions", "geographers who love the future", "knowledge development", " Futuristic higher education", "Future of Religion", "Regional Relations", "Strategic Foresight" and "Heavy Weight of Method". Conclusion: According to the findings of the current research and the high frequency of the keyword scenario, as well as the density and relationships of this keyword with other keywords, it can be concluded that the scenario is the dominant approach in futures studies. Also, according to the resulting clusters, it was observed that these researches have a high variety, but addressing the future in many areas is still neglected.

Yearly Impact: مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

View 156

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesDownload 15 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesCitation 0 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesRefrence 0
Writer: 

Hashempour Sadeghian Armindokht | NEZAMABADI POUR HOSSEIN

Issue Info: 
  • Year: 

    2015
  • Volume: 

    1
Measures: 
  • Views: 

    167
  • Downloads: 

    72
Abstract: 

TEXT MINING IS A FIELD THAT IS CONSIDERED AS AN EXTENSION OF DATA MINING IN GENERAL, ALSO KNOWN AS KNOWLEDGE DISCOVERY IN DATABASES. IN THE CONTEXT OF TEXT MINING, DOCUMENT Clustering IS AN UNSUPERVISED LEARNING METHOD FOR AUTOMATICALLY SEGREGATING SIMILAR DOCUMENTS OF A CORPUS INTO THE SAME GROUP, CALLED CLUSTER, AND DISSIMILAR DOCUMENTS TO DIFFERENT GROUPS. WHILE HUNDREDS OF Clustering ALGORITHMS EXIST, IT IS DIFFICULT TO FIND A SINGLE Clustering ALGORITHM THAT CAN HANDLE ALL TYPES OF CLUSTER SHAPES AND SIZES, OR EVEN DECIDE WHICH ALGORITHM WOULD BE THE BEST ONE FOR A PARTICULAR DATA SET. EACH ALGORITHM HAS ITS OWN APPROACH FOR ESTIMATING THE NUMBER OF CLUSTERS, IMPOSING A STRUCTURE ON THE DATA, AND VALIDATING THE RESULTING CLUSTERS. THE IDEA OF COMBINING DIFFERENT Clustering EMERGED AS AN APPROACH TO OVERCOME THE WEAKNESS OF SINGLE ALGORITHMS AND FURTHER IMPROVE THEIR PERFORMANCES. ON THE OTHER HAND, INSPIRED BY THE GRAVITATIONAL LAW, DIFFERENT Clustering ALGORITHMS HAVE BEEN INTRODUCED THAT EACH ONE ATTEMPTED TO CLUSTER COMPLEX DATASETS. GRAVITATIONAL ENSEMBLE Clustering (GEC) IS AN ENSEMBLE METHOD THAT EMPLOYS BOTH THE CONCEPTS OF GRAVITATIONAL Clustering AND ENSEMBLE Clustering TO REACH A BETTER Clustering RESULT. THIS PAPER REPRESENTS AN APPLICATION OF GEC TO THE PROBLEM OF DOCUMENT Clustering. THE PROPOSED METHOD USES A MODIFICATION OF THE ORIGINAL GEC ALGORITHM. THIS MODIFICATION TRIES TO PRODUCE A MORE VARIED Clustering ENSEMBLE USING NEW PARAMETER SETTING. COMPUTATIONAL EXPERIMENTS WERE CONDUCTED TO TEST THE PERFORMANCE OF THE GEC APPROACH USING DOCUMENT DATASETS. PROMISING RESULTS OF THE PRESENTED METHOD WERE OBTAINED IN COMPARISON WITH COMPETING ALGORITHMS. ...

Yearly Impact:   مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

View 167

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesDownload 72
Author(s): 

Issue Info: 
  • Year: 

    2021
  • Volume: 

    104
  • Issue: 

    -
  • Pages: 

    0-0
Measures: 
  • Citations: 

    1
  • Views: 

    55
  • Downloads: 

    0
Keywords: 
Abstract: 

Yearly Impact: مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

View 55

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesDownload 0 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesCitation 1 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesRefrence 0
Author(s): 

Arefi Mohsen

Issue Info: 
  • Year: 

    2025
  • Volume: 

    10
  • Issue: 

    3
  • Pages: 

    87-101
Measures: 
  • Citations: 

    0
  • Views: 

    8
  • Downloads: 

    0
Abstract: 

In this paper, we present an approach to fit some Clustering fuzzy linear regression models based on the fuzzy response variables and fuzzy parameters. In this approach, we first introduce a method for Clustering crisp/fuzzy data based on fuzzy c-means Clustering, and then, we fit some Clustering fuzzy regression models based on the geometric mean. The optimal Clustering fuzzy regression models are evaluated under two indices of goodness of fit. The applications of the proposed approach are studied in modeling some real data sets.

Yearly Impact: مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

View 8

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesDownload 0 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesCitation 0 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesRefrence 0
Issue Info: 
  • Year: 

    2014
  • Volume: 

    2
  • Issue: 

    4
  • Pages: 

    196-204
Measures: 
  • Citations: 

    0
  • Views: 

    569
  • Downloads: 

    112
Abstract: 

In this paper, the problem of de-noising of an image contaminated with Additive White Gaussian Noise (AWGN) is studied. This subject is an open problem in signal processing for more than 50 years. Local methods suggested in recent years, have obtained better results than global methods. However by more intelligent training in such a way that first, important data is more effective for training, second, Clustering in such way that training blocks lie in low-rank subspaces, we can design a dictionary applicable for image de-noising and obtain results near the state of the art local methods. In the present paper, we suggest a method based on global Clustering of image constructing blocks. As the type of Clustering plays an important role in Clustering-based de-noising methods, we address two questions about the Clustering. The first, which parts of the data should be considered for Clustering? and the second, what data Clustering method is suitable for de-noising.? Then Clustering is exploited to learn an over complete dictionary. By obtaining sparse decomposition of the noisy image blocks in terms of the dictionary atoms, the de-noised version is achieved. In addition to our framework, 7 popular dictionary learning methods are simulated and compared. The results are compared based on two major factors: (1) de-noising performance and (2) execution time. Experimental results show that our dictionary learning framework outperforms its competitors in terms of both factors.

Yearly Impact: مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

View 569

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesDownload 112 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesCitation 0 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesRefrence 13
Issue Info: 
  • Year: 

    2024
  • Volume: 

    16
  • Issue: 

    1
  • Pages: 

    20-45
Measures: 
  • Citations: 

    0
  • Views: 

    8
  • Downloads: 

    0
Abstract: 

In this paper, we convert the fuzzy Clustering ensemble consensus function problem into an optimization problem based on the reliability-based co-association matrix that minimize distance between co-association matrix of final Clustering and co-association matrix of base-Clusterings in the ensemble. The optimization problem is a constrained nonlinear objective function and we solve it by sparse sequential quadratic programming (SSQP).

Yearly Impact: مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

View 8

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesDownload 0 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesCitation 0 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesRefrence 0
Issue Info: 
  • Year: 

    2020
  • Volume: 

    17
  • Issue: 

    2 (44)
  • Pages: 

    85-100
Measures: 
  • Citations: 

    0
  • Views: 

    498
  • Downloads: 

    0
Abstract: 

Clustering algorithms are highly dependent on different factors such as the number of clusters, the specific Clustering algorithm, and the used distance measure. Inspired from ensemble classification, one approach to reduce the effect of these factors on the final Clustering is ensemble Clustering. Since weighting the base classifiers has been a successful idea in ensemble classification, in this paper we propose a method to use weighting in the ensemble Clustering problem. The accuracies of base Clusterings are estimated using an algorithm from crowdsourcing literature called agreement/disagreement method (AD). This method exploits the agreements or disagreements between different labelers for estimating their accuracies. It assumes different labelers have labeled a set of samples, so each two persons have an agreement ratio in their labeled samples. Under some independence assumptions, there is a closed-form formula for the agreement ratio between two labelers based on their accuracies. The AD method estimates the labelers’ accuracies by minimizing the difference between the parametric agreement ratio from the closed-form formula and the agreement ratio from the labels provided by labelers. To adapt the AD method to the Clustering problem, an agreement between two Clusterings are defined as having the same opinion about a pair of samples. This agreement can be as either being in the same cluster or being in different clusters. In other words, if two Clusterings agree that two samples should be in the same or different clusters, this is considered as an agreement. Then, an optimization problem is solved to obtain the base Clusterings’ accuracies such that the difference between their available agreement ratios and the expected agreements based on their accuracies is minimized. To generate the base Clusterings, we use four different settings including different Clustering algorithms, different distance measures, distributed features, and different number of clusters. The used Clustering algorithms are mean shift, k-means, mini-batch k-means, affinity propagation, DBSCAN, spectral, BIRCH, and agglomerative Clustering with average and ward metrics. For distance measures, we use correlation, city block, cosine, and Euclidean measures. In distributed features setting, the k-means algorithm is performed for 40%, 50%, … , and 100% of randomly selected features. Finally, for different number of clusters, we run the k-means algorithm by k equals to 2 and also 50%, 75%, 100%, 150%, and 200% of true number of clusters. We add the estimated weights by the AD algorithm to two famous ensemble Clustering methods, i. e., Cluster-based Similarity Partitioning Algorithm (CSPA) and Hyper Graph Partitioning Algorithm (HGPA). In CSPA, the similarity matrix is computed by taking a weighted average of the opinions of different Clusterings. In HGPA, we propose to weight the hyperedges by different values such as the estimated Clustering accuracies, size of clusters, and the silhouette of Clusterings. The experiments are performed on 13 real and artificial datasets. The reported evaluation measures include adjusted rand index, Fowlkes-Mallows, mutual index, adjusted mutual index, normalized mutual index, homogeneity, completeness, v-measure, and purity. The results show that in the majority of cases, the proposed weighted-based method outperforms the unweighted ensemble Clustering. In addition, the weighting is more effective in improving the HGPA algorithm than CSPA. For different weighting methods proposed for HGPA algorithm, the best average results are obtained when we use the accuracies estimated by the AD method to weight the hyperedges, and the worst results are obtained when using the normalized silhouette measure for weighting. Finally, among different methods for generating base Clusterings, the best results in weighted HGPA are obtained when we use different Clustering algorithms to come up with different base Clusterings.

Yearly Impact: مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

View 498

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesDownload 0 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesCitation 0 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesRefrence 0
litScript
telegram sharing button
whatsapp sharing button
linkedin sharing button
twitter sharing button
email sharing button
email sharing button
email sharing button
sharethis sharing button