Archive for the ‘software’ Category

Group-icon (1)






This is a summary of resources posted on [Corpora-List] early 2014

CMU-Cambridge Statistical Language Modeling toolkit

Sketch Engine

Lawrence Anthony’s AntConc



Software for the extraction of n-grams as well as patterns that are not consecutive (skipgrams). The software is written in C++ for speed and memory efficiency but comes with a Python binding for usage from Python script. It also has a standalone CLI tool that can do what you want. f

Maarten van Gompel

GnuPG key: 0x1A31555C  XMPP:

One of those friends you really trust recently pointed ooVoo as an alternative to Skype. This is what you can read on their website:

ooVoo makes it easy to have a video call with those you want to keep connected with. Video chat face-to-face anytime with your ooVoo contacts or have a free browser-based online video call with friends who aren’t on ooVoo yet. Video call up to twelve friends, family members or colleagues at the same time – with video quality that’s like being face-to-face in the same room.

Well, I’m starting to use it and looks pretty good to me. Sound quality is great plus mobile appas and multiple video conferencing. Good for your empty pockets in this economy!!!!

Journal cover:

Authors and contents:

This can be accessed online here.

For moreinformation on the IJES please visit


Posted: November 21, 2008 in analysis of language, Java, software

LingPipe is a suite of Java libraries for the linguistic analysis of human language.

Feature Overview

LingPipe’s information extraction and data mining tools:
track mentions of entities (e.g. people or proteins);
link entity mentions to database entries;
uncover relations between entities and actions;
classify text passages by language, character encoding, genre, topic, or sentiment;
correct spelling with respect to a text collection;
cluster documents by implicit topic and discover significant trends over time; and
provide part-of-speech tagging and phrase chunking.


LingPipe’s architecture is designed to be efficient, scalable, reusable, and robust. Highlights include:
Java API with source code and unit tests;
multi-lingual, multi-domain, multi-genre models;
training with new data for new tasks;
n-best output with statistical confidence estimates;
online training (learn-a-little, tag-a-little);
thread-safe models and decoders for concurrent-read exclusive-write (CREW) synchronization; and
character encoding-sensitive I/O

CRAT is a tool that allows intuitive and easy queries on annotated language corpora. A distinctive feature of this tool is the possibility of carrying out searches based on criteria of very diverse natures or dimensions. As for dimensionality, CRAT does not only allow the most basic query options, including word forms and lemmas, but also offers supplementary mechanisms to conduct morphosyntactic queries as well as dimension-based queries which provide more semantic granularity to the results, as is the case of the categories or dimensions included in Biber 1988 and 2003.

A further distinctive characteristic of this tool is the possibility of combining the multidimensionality feature of the queries with options to restrict searches to specific parts of the corpus according to a wide range of semantic parameters. This feature turns the tool into an element of information guidance and selection that facilitates text exploration in accordance with factors which are not directly present in the text. In other words, the user can select excerpts of the corpus which include specific meta-information about the producer of the fragment such as their mother tongue, their educational background or the number of years spent abroad.

Finally, the output resulting from the abovementioned exploratory possibilities linked to the multidimensionality features of the tool is displayed in a user-friendly way and complemented by statistical information that enables the establishment of comparisons from two different perspectives: inter-dimensional and intra-dimensional. Apart from these features, CRAT also offers the possibility of browsing through the output results in order to establish relations which are not readily apparent.

Download User’s manual here (Spanish)

Want to try CRAT? Send us an e-mail: pascualf at um dot es

CRAT es una herramienta que permite realizar búsquedas de forma fácil e intuitiva dentro de corpus lingüísticos anotados. Una característica que la diferencia de las demás es la posibilidad de realizar búsquedas atendiendo a criterios de diferentes naturalezas o dimensiones. Además, la mezcla de estos criterios desemboca en la creación de ordenes de búsquedas de información multidimensionales (Biber 1988). En cuanto a la dimensionalidad, decir que no solo permite el estado más básico de búsqueda de información atendiendo a las formas o lemas de las palabras sino que además proporciona mecanismos para realizar búsquedas de índole morfo-sintáctico así como búsquedas en dimensión que proporcionan mucho más nivel de semántica en los resultados, como son las de las categorías o dimensiones proporcionadas por Biber (1988,2003).

Otro elemento diferenciador de la herramienta atiende a la posibilidad de combinar esta multi-dimensionalidad de las búsquedas con mecanismos para seleccionar las partes del corpus sobre las que se desean realizar las búsquedas atendiendo a parámetros muy ricos semánticamente hablando. Esta característica convierte a la herramienta un elemento de guiado y selección de la información que permite realizar exploración dentro del texto atendiendo a factores no esta directamente presente en el texto. i.e. se pueden seleccionar fragmentos de corpus atendiendo a la meta-información disponible del productor del fragmento como nº de año en el extranjero, idioma, nivel de estudios, L1, etc.

Por último, el elemento unificado de esta posibilidad exploratoria con la multi-dimensionalidad de las búsquedas es la obtención tanto de los resultado de forma amigable como de la información estadística que permite hacer comparaciones atendiendo a estimadores estadísticos desde dos perspectiva: tanto las comparaciones inter-dimensionales como intra-dimensionales. Además la posibilidad de navegar por la información resultante para establecer relaciones que a simple vista no se podrían descubrir es otra virtud de esta herramienta.

Descarga el manual de uso aquí.

Si quiere usar CRAT, mándenos un e-mail: pascualf arroba um punto es