Titre
Discourse Type Clustering using POS n-gram Profiles and High-Dimensional Embeddings
Type
article de conférence/colloque
Institution
UNIL/CHUV/Unisanté + institutions partenaires
Auteur(s)
Cocco, C.
Auteure/Auteur
Liens vers les personnes
Liens vers les unités
Maison d’édition
Association for Computational Linguistics
Titre du livre ou conférence/colloque
Proceedings of the Student Research Workshop at the 13th Conference of the European Chapter of the Association for Computational Linguistics
Unité
Université d'Avignon
Adresse
Stroudsburg
ISBN
978-1-937284-19-0
Statut éditorial
Publié
Date de publication
2012-04
Première page
55
Dernière page/numéro d’article
63
Peer-reviewed
Oui
Langue
anglais
Notes
Actes de conférence en ligne
Résumé
Abstract:
To cluster textual sequence types (discourse types/modes) in French texts, K-means algorithm with high-dimensional embeddings and fuzzy clustering algorithm were applied on clauses whose POS (part-ofspeech) n-gram profiles were previously extracted. Uni-, bi- and trigrams were used on four 19th century French short stories by Maupassant. For high-dimensional embeddings, power transformations on the chi-squared distances between clauses were explored. Preliminary results show that highdimensional embeddings improve the quality of clustering, contrasting the use of bi and trigrams whose performance is disappointing, possibly because of feature space sparsity.
To cluster textual sequence types (discourse types/modes) in French texts, K-means algorithm with high-dimensional embeddings and fuzzy clustering algorithm were applied on clauses whose POS (part-ofspeech) n-gram profiles were previously extracted. Uni-, bi- and trigrams were used on four 19th century French short stories by Maupassant. For high-dimensional embeddings, power transformations on the chi-squared distances between clauses were explored. Preliminary results show that highdimensional embeddings improve the quality of clustering, contrasting the use of bi and trigrams whose performance is disappointing, possibly because of feature space sparsity.
PID Serval
serval:BIB_5A2CBDB06CA2
Date de création
2012-08-22T12:18:03.554Z
Date de création dans IRIS
2025-05-20T14:08:52Z
Fichier(s)![Vignette d'image]()
En cours de chargement...
Nom
BIB_5A2CBDB06CA2.P001.pdf
Version du manuscrit
preprint
Taille
1.77 MB
Format
Adobe PDF
PID Serval
serval:BIB_5A2CBDB06CA2.P001
URN
urn:nbn:ch:serval-BIB_5A2CBDB06CA20
Somme de contrôle
(MD5):2c3e5748a6d996b027d1d6d9c570a727