Discourse Type Clustering using POS n-gram Profiles and High-Dimensional Embeddings

Cocco, C.

Titre

Discourse Type Clustering using POS n-gram Profiles and High-Dimensional Embeddings

Type

article de conférence/colloque

Institution

UNIL/CHUV/Unisanté + institutions partenaires

Auteur(s)

Cocco, C.

Auteure/Auteur

Liens vers les personnes

Cocco, Christelle

Liens vers les unités

Sect.d'inform. et méthodes mathématiques

Maison d’édition

Association for Computational Linguistics

Titre du livre ou conférence/colloque

Proceedings of the Student Research Workshop at the 13th Conference of the European Chapter of the Association for Computational Linguistics

Unité

Université d'Avignon

Adresse

Stroudsburg

ISBN

978-1-937284-19-0

Statut éditorial

Publié

Date de publication

2012-04

Première page

55

Dernière page/numéro d’article

63

Peer-reviewed

Oui

Langue

anglais

Notes

Actes de conférence en ligne

Résumé

Abstract:
To cluster textual sequence types (discourse types/modes) in French texts, K-means algorithm with high-dimensional embeddings and fuzzy clustering algorithm were applied on clauses whose POS (part-ofspeech) n-gram profiles were previously extracted. Uni-, bi- and trigrams were used on four 19th century French short stories by Maupassant. For high-dimensional embeddings, power transformations on the chi-squared distances between clauses were explored. Preliminary results show that highdimensional embeddings improve the quality of clustering, contrasting the use of bi and trigrams whose performance is disappointing, possibly because of feature space sparsity.

Sujets

Discourse types

K-means

high-dimensional embe...

fuzzy clustering

PID Serval

serval:BIB_5A2CBDB06CA2

Permalien

https://iris.unil.ch/handle/iris/43146

URL éditeur

http://aclweb.org/anthology-new/E/E12/E12-3.pdf

Date de création

2012-08-22T12:18:03.554Z

Date de création dans IRIS

2025-05-20T14:08:52Z

Fichier(s)

Nom

BIB_5A2CBDB06CA2.P001.pdf

Version du manuscrit

preprint

Taille

1.77 MB

Format

Adobe PDF

PID Serval

serval:BIB_5A2CBDB06CA2.P001

URN

urn:nbn:ch:serval-BIB_5A2CBDB06CA20

Somme de contrôle

(MD5):2c3e5748a6d996b027d1d6d9c570a727