Service List

Application Descriptions

Status of the underlying infrastructure components

Services and tools that are offered centrally must be reliable and available. Below, you can see a first impression of the current status of the infrastructure components. The graphic visualizes interactively which services are currently in operation.

WebLicht Dep Parsing EN UP

weblicht.sfs.uni-tuebingen.de

HTTP OK: HTTP/1.1 302 Found - 1040 bytes in 0.080 second response time

SRU/CQL-32 UP

www.bbaw.de

SRU/CQL OK: valid XML

OAI-PMH-12 UP

ivdnt.org

OAI-PMH OK: valid XML

SRU/CQL-17 UP

fedora.clarin-d.uni-saarland.de

SRU/CQL OK: valid XML

OAI-PMH-44 UP

portulanclarin.net

OAI-PMH OK: valid XML

ReSpa UP

ws.clarin-pl.eu

HTTP OK: HTTP/1.1 200 OK - 658 bytes in 0.115 second response time

OAI-PMH-4 UP

www.ids-mannheim.de

OAI-PMH OK: valid XML

OAI-PMH-21 UP

www.sfs.uni-tuebingen.de

OAI-PMH OK: valid XML

OAI-PMH-45 UP

phonotheque.mmsh.huma-num.fr

OAI-PMH OK: valid XML

Inkluz UP

ws.clarin-pl.eu

HTTP OK: HTTP/1.1 200 OK - 658 bytes in 0.114 second response time

Automatic Transcription of Dutch Speech Recordings (Wav file) UP

github.com

HTTP OK: HTTP/1.1 200 OK - 8938 bytes in 0.150 second response time

OAI-PMH-19 WARNING

ota.ahds.ac.uk

OAI-PMH WARNING: XSD validation failed :3:0:ERROR:SCHEMASV:SCHEMAV_CVC_PATTERN_VALID: Element '{http://www.openarchives.org/OAI/2.0/oai-identifier}repositoryIdentifier': [facet 'pattern'] The value 'ota.bodleian.ox.ac.uk ' is not accepted by the pattern '[a-zA-Z][a-zA-Z0-9\-]*(\.[a-zA-Z][a-zA-Z0-9\-]*)+'. :3:0:ERROR:SCHEMASV:SCHEMAV_CVC_DATATYPE_VALID_1_2_1: Element '{http://www.openarchives.org/OAI/2.0/oai-identifier}repositoryIdentifier': 'ota.bodleian.ox.ac.uk ' is not a valid value of the atomic type '{http://www.openarchives.org/OAI/2.0/oai-identifier}repositoryIdentifierType'.

handle resolving /11858/00-1778-0000-0005-896C-F?noredirect UP

HTTP OK: HTTP/1.1 200 - 2143 bytes in 0.414 second response time

SRU/CQL-SRU/FCS server UP

SRU/CQL OK: valid XML

SRU/CQL-23 WARNING

ivdnt.org

SRU/CQL WARNING: XSD validation failed :2:0:ERROR:SCHEMASV:SCHEMAV_CVC_ELT_1: Element '{http://www.loc.gov/standards/sru/}explainResponse': No matching global declaration available for the validation root.

SRU/CQL-38 UP

www.clarin.lv

SRU/CQL OK: valid XML

WebLicht Tokenization TUR UP

weblicht.sfs.uni-tuebingen.de

HTTP OK: HTTP/1.1 302 Found - 1040 bytes in 0.087 second response time

SRU/CQL-9 UP

www.bbaw.de

SRU/CQL OK: valid XML

handle document access /11858/00-1778-0000-0005-896C-F UP

HTTP OK: HTTP/1.1 301 - 183 bytes in 2.234 second response time

OAI-PMH-14 UP

www.keeleressursid.ee

OAI-PMH OK: valid XML

IMS Fedora Commons UP

141.58.160.14

HTTP OK: HTTP/1.1 200 OK - 4465 bytes in 0.117 second response time

SRU/CQL-24 UP

www.bbaw.de

SRU/CQL OK: valid XML

BASWebService UP

clarin.phonetik.uni-muenchen.de

HTTP OK: HTTP/1.1 200 200 - 261687 bytes in 7.412 second response time

OAI-PMH-32 UP

ilc4clarin.ilc.cnr.it

OAI-PMH OK: valid XML

OAI-PMH-53 WARNING

www.clarin.lv

OAI-PMH WARNING: XSD validation failed :5:0:ERROR:SCHEMASV:SCHEMAV_CVC_PATTERN_VALID: Element '{http://www.openarchives.org/OAI/2.0/oai-identifier}sampleIdentifier': [facet 'pattern'] The value '' is not accepted by the pattern 'oai:[a-zA-Z][a-zA-Z0-9\-]*(\.[a-zA-Z][a-zA-Z0-9\-]*)+:[a-zA-Z0-9\-_\.!~\*'\(\);/\?:@&=\+$,%]+'. :5:0:ERROR:SCHEMASV:SCHEMAV_CVC_DATATYPE_VALID_1_2_1: Element '{http://www.openarchives.org/OAI/2.0/oai-identifier}sampleIdentifier': '' is not a valid value of the atomic type '{http://www.openarchives.org/OAI/2.0/oai-identifier}sampleIdentifierType'.

OAI-PMH-42 UP

clarin.eurac.edu

OAI-PMH OK: valid XML

HTTPS UP

134.94.199.149

HTTP OK: HTTP/1.1 302 Found - 506 bytes in 0.041 second response time

OAI-PMH-7 UP

www.bbaw.de

OAI-PMH OK: valid XML

WebLicht All In One (NL) UP

weblicht.sfs.uni-tuebingen.de

HTTP OK: HTTP/1.1 302 Found - 1040 bytes in 0.055 second response time

Shibboleth SP UP

HTTP OK: HTTP/1.1 302 Found - 1756 bytes in 0.047 second response time

OAI-PMH-2 UP

asv.informatik.uni-leipzig.de

OAI-PMH OK: valid XML

HTTP UP

134.94.199.149

HTTP OK: HTTP/1.1 301 Moved Permanently - 541 bytes in 0.029 second response time

Concraft -> DependencyParser UP

zil.ipipan.waw.pl

HTTP OK: HTTP/1.1 200 OK - 8217 bytes in 0.138 second response time

handle document access /11022/0000-0000-20E2-C UP

HTTP OK: HTTP/1.1 200 OK - 373128 bytes in 3.683 second response time

Iobber UP

ws.clarin-pl.eu

HTTP OK: HTTP/1.1 200 OK - 658 bytes in 0.121 second response time

Concraft -> Bartek -> NicolasSummarizer UP

zil.ipipan.waw.pl

HTTP OK: HTTP/1.1 200 OK - 8217 bytes in 0.081 second response time

Handle resolve /10932/00-017B-E190-A83E-6F01-5?noredirect UP

clarin.ids-mannheim.de

HTTP OK: HTTP/1.1 200 - 2223 bytes in 0.551 second response time

SRU/CQL-33 UP

fedora.clarin-d.uni-saarland.de

SRU/CQL OK: valid XML

OAI-PMH-16 UP

nlp.pwr.wroc.pl

OAI-PMH OK: valid XML

SRU/CQL-6 UP

hdl.handle.net

SRU/CQL OK: valid XML

CLARIN VLO [UI][prod] UP

vlo.clarin.eu

HTTP OK: HTTP/1.1 200 OK - 3032 bytes in 0.074 second response time

Concraft -> Sentipejd UP

iis.ipipan.waw.pl

HTTP OK: HTTP/1.1 200 OK - 8217 bytes in 0.146 second response time

HTTPS CLARIN-D project wiki UP

141.58.160.14

HTTP OK: HTTP/1.1 301 Moved Permanently - 541 bytes in 0.016 second response time

OAI-PMH-18 UP

talkbank.org

OAI-PMH OK: valid XML

OAI-PMH-31 UP

clarino.uib.no

OAI-PMH OK: valid XML

SRU/CQL-34 DOWN

www.korpus.cz

SRU/CQL CRITICAL: XML syntax error EntityRef: expecting ';', line 10, column 135 (, line 10)

OAI-PMH-29 UP

clarino.uib.no

OAI-PMH OK: valid XML

SRU/CQL-39 UP

cocoon.huma-num.fr

SRU/CQL OK: valid XML

WCRFT2 UP

nlp.pwr.wroc.pl

HTTP OK: HTTP/1.1 200 OK - 658 bytes in 0.115 second response time

SRU/CQL-35 UP

www.ids-mannheim.de

SRU/CQL OK: valid XML

SRU/CQL-3 UP

lindat.cz

SRU/CQL OK: valid XML

TermoPL UP

ws.clarin-pl.eu

HTTP OK: HTTP/1.1 200 OK - 658 bytes in 0.114 second response time

HTTP UP

HTTP OK: HTTP/1.1 301 Moved Permanently - 514 bytes in 0.025 second response time

OAI-PMH-62 UP

trolling.uit.no

OAI-PMH OK: valid XML

handle resolving /11022/0000-0000-20E2-C?noredirect UP

HTTP OK: HTTP/1.1 200 - 3040 bytes in 0.197 second response time

SRU/CQL-29 UP

fedora.clarin-d.uni-saarland.de

SRU/CQL OK: valid XML

SRU/CQL-44 UP

arche.acdh.oeaw.ac.at

SRU/CQL OK: valid XML

Spatial UP

ws.clarin-pl.eu

HTTP OK: HTTP/1.1 200 OK - 658 bytes in 0.113 second response time

NLP-HUB (multiple NER tools) UP

www.d4science.org

HTTP OK: HTTP/1.1 302 Found - 698 bytes in 0.175 second response time

DARIAH-DE Geo-Browser (KML) UP

de.dariah.eu

HTTP OK: HTTP/1.1 200 OK - 8640 bytes in 0.079 second response time

SRU/CQL-37 UP

ilc4clarin.ilc.cnr.it

SRU/CQL OK: valid XML

Morfeusz 2 UP

ws.clarin-pl.eu

HTTP OK: HTTP/1.1 200 OK - 658 bytes in 0.197 second response time

WebLicht Lemmas DE UP

weblicht.sfs.uni-tuebingen.de

HTTP OK: HTTP/1.1 302 Found - 1040 bytes in 0.053 second response time

CLARIN OAI-PMH Validator UP

clarin.eu

HTTP OK: HTTP/1.1 200 OK - 588 bytes in 0.307 second response time

OAI-PMH-57 WARNING

worldviews.gei.de

OAI-PMH WARNING: XSD validation failed :10:0:ERROR:SCHEMASV:SCHEMAV_CVC_PATTERN_VALID: Element '{http://www.openarchives.org/OAI/2.0/}adminEmail': [facet 'pattern'] The value 'gei-digital [at] leibniz-gei [dot] de' is not accepted by the pattern '\S+@(\S+\.)+\S+'. :10:0:ERROR:SCHEMASV:SCHEMAV_CVC_DATATYPE_VALID_1_2_1: Element '{http://www.openarchives.org/OAI/2.0/}adminEmail': 'gei-digital [at] leibniz-gei [dot] de' is not a valid value of the atomic type '{http://www.openarchives.org/OAI/2.0/}emailType'.

SRU/CQL-26 UP

www.bbaw.de

SRU/CQL OK: valid XML

WebLicht POSTags Lemmas EN UP

weblicht.sfs.uni-tuebingen.de

HTTP OK: HTTP/1.1 302 Found - 1040 bytes in 0.053 second response time

Distanbol WARNING

www.oeaw.ac.at

HTTP WARNING: HTTP/1.1 400 - 243 bytes in 0.121 second response time

OAI-PMH-28 UP

www.nb.no

OAI-PMH OK: valid XML

WebLicht POSTags Lemmas FR UP

weblicht.sfs.uni-tuebingen.de

HTTP OK: HTTP/1.1 302 Found - 1040 bytes in 0.051 second response time

NER NLTK UP

ws.clarin-pl.eu

HTTP OK: HTTP/1.1 200 OK - 658 bytes in 0.114 second response time

SRU/CQL-4 DOWN

lindat.cz

SRU/CQL CRITICAL: XML syntax error Opening and ending tag mismatch: body line 2 and font, line 3, column 16 (, line 3)

Concraft -> Nerf UP

zil.ipipan.waw.pl

HTTP OK: HTTP/1.1 200 OK - 8217 bytes in 0.147 second response time

CLARIN-D project web site UP

141.58.160.14

HTTP OK: HTTP/1.1 200 OK - 66042 bytes in 0.166 second response time

SRU/CQL-36 DOWN

clarin.dk

SRU/CQL CRITICAL: HTTP Response502

SRU/CQL-30 UP

www.bbaw.de

SRU/CQL OK: valid XML

SRU/CQL-16 UP

fedora.clarin-d.uni-saarland.de

SRU/CQL OK: valid XML

OAI-PMH-10 UP

www.huygens.knaw.nl

OAI-PMH OK: valid XML

OAI-PMH-36 UP

www.cedifor.de

OAI-PMH OK: valid XML

WebLicht Lemmas EN UP

weblicht.sfs.uni-tuebingen.de

HTTP OK: HTTP/1.1 302 Found - 1040 bytes in 0.088 second response time

WebLicht Const Parsing EN UP

weblicht.sfs.uni-tuebingen.de

HTTP OK: HTTP/1.1 302 Found - 1040 bytes in 0.052 second response time

OAI-PMH-50 UP

clarin.is

OAI-PMH OK: valid XML

MorphoDiTa UP

ws.clarin-pl.eu

HTTP OK: HTTP/1.1 200 OK - 658 bytes in 0.115 second response time

OAI-PMH-56 UP

www.ortolang.fr

OAI-PMH OK: valid XML

CLARIN VCR [UI][prod] UP

clarin.ids-mannheim.de

HTTP OK: HTTP/1.1 200 OK - 2798 bytes in 0.042 second response time

HTTPS UP

HTTP OK: HTTP/1.1 200 OK - 492 bytes in 0.052 second response time

SRU/CQL-22 UP

ivdnt.org

SRU/CQL OK: valid XML

WebSty UP

ws.clarin-pl.eu

HTTP OK: HTTP/1.1 200 OK - 658 bytes in 0.113 second response time

WebLicht NamedEntities DE UP

weblicht.sfs.uni-tuebingen.de

HTTP OK: HTTP/1.1 302 Found - 1040 bytes in 0.053 second response time

LINDAT Translation UP

lindat.mff.cuni.cz

HTTP OK: HTTP/1.1 200 OK - 20750 bytes in 0.139 second response time

SRU/CQL-11 UP

www.ims.uni-stuttgart.de

SRU/CQL OK: valid XML

HTTP CLARIN-D project wiki UP

141.58.160.14

HTTP OK: HTTP/1.1 302 Found - 509 bytes in 0.012 second response time

OAI-PMH-55 UP

asv.informatik.uni-leipzig.de

OAI-PMH OK: valid XML

OAI-PMH-38 UP

www.ru.nl

OAI-PMH OK: valid XML

HTTP UP

134.94.199.148

HTTP OK: HTTP/1.1 301 Moved Permanently - 552 bytes in 0.021 second response time

SRU/CQL-12 UP

www.ids-mannheim.de

SRU/CQL OK: valid XML

MaltParser UP

ws.clarin-pl.eu

HTTP OK: HTTP/1.1 200 OK - 658 bytes in 0.113 second response time

OAI-PMH-63 UP

www.clarin.gr

OAI-PMH OK: valid XML

OAI-PMH-43 WARNING

clarin.dk

OAI-PMH WARNING: XSD validation failed :3:0:ERROR:SCHEMASV:SCHEMAV_CVC_PATTERN_VALID: Element '{http://www.openarchives.org/OAI/2.0/oai-identifier}repositoryIdentifier': [facet 'pattern'] The value 'repository.clarin.dk ' is not accepted by the pattern '[a-zA-Z][a-zA-Z0-9\-]*(\.[a-zA-Z][a-zA-Z0-9\-]*)+'. :3:0:ERROR:SCHEMASV:SCHEMAV_CVC_DATATYPE_VALID_1_2_1: Element '{http://www.openarchives.org/OAI/2.0/oai-identifier}repositoryIdentifier': 'repository.clarin.dk ' is not a valid value of the atomic type '{http://www.openarchives.org/OAI/2.0/oai-identifier}repositoryIdentifierType'.

WebLicht Const Parsing DE UP

weblicht.sfs.uni-tuebingen.de

HTTP OK: HTTP/1.1 302 Found - 1040 bytes in 0.052 second response time

OAI-PMH-37 UP

www.meertens.knaw.nl

OAI-PMH OK: valid XML

SRU/CQL-41 UP

portulanclarin.net

SRU/CQL OK: valid XML

NagVis access UP

134.94.199.148

HTTP OK: HTTP/1.1 302 Found - 1077 bytes in 0.007 second response time

OAI-PMH-48 UP

repository.de.dariah.eu

OAI-PMH OK: valid XML

OAI-PMH-46 UP

www.sadilar.org

OAI-PMH OK: valid XML

Automatic Transcription of Dutch Speech Recordings (Ogg file) UP

github.com

HTTP OK: HTTP/1.1 200 OK - 8938 bytes in 0.123 second response time

OAI-PMH-52 UP

clarin-belarus.corpus.by

OAI-PMH OK: valid XML

Fedora Commons repository UP

HTTP OK: HTTP/1.1 200 OK - 3841 bytes in 0.102 second response time

OAI-PMH-61 UP

arche.acdh.oeaw.ac.at

OAI-PMH OK: valid XML

Concraft UP

zil.ipipan.waw.pl

HTTP OK: HTTP/1.1 200 OK - 8217 bytes in 0.152 second response time

WebLicht All In One (DE) UP

weblicht.sfs.uni-tuebingen.de

HTTP OK: HTTP/1.1 302 Found - 1040 bytes in 0.057 second response time

WebLicht Morphology DE UP

weblicht.sfs.uni-tuebingen.de

HTTP OK: HTTP/1.1 302 Found - 1040 bytes in 0.053 second response time

SRU/CQL-21 DOWN

nlp.pwr.wroc.pl

SRU/CQL CRITICAL: HTTP Response500

WebLicht Advanced Mode UP

weblicht.sfs.uni-tuebingen.de

HTTP OK: HTTP/1.1 302 Found - 1040 bytes in 0.051 second response time

Voyant Tools UP

voyant-tools.org

HTTP OK: HTTP/1.1 200 OK - 8120 bytes in 0.468 second response time

Topic UP

ws.clarin-pl.eu

HTTP OK: HTTP/1.1 200 OK - 658 bytes in 0.114 second response time

OAI-PMH-30 UP

cocoon.huma-num.fr

OAI-PMH OK: valid XML

SRU/CQL-5 UP

asv.informatik.uni-leipzig.de

SRU/CQL OK: valid XML

SRU/CQL-15 UP

fedora.clarin-d.uni-saarland.de

SRU/CQL OK: valid XML

WebLicht POSTags Lemmas DE UP

weblicht.sfs.uni-tuebingen.de

HTTP OK: HTTP/1.1 302 Found - 1040 bytes in 0.054 second response time

Concraft->Spejd UP

zil.ipipan.waw.pl

HTTP OK: HTTP/1.1 200 OK - 8217 bytes in 0.081 second response time

Liner2 UP

ws.clarin-pl.eu

HTTP OK: HTTP/1.1 200 OK - 658 bytes in 0.116 second response time

Automatic Transcription of Dutch Speech Recordings (MP3 file) UP

github.com

HTTP OK: HTTP/1.1 200 OK - 8938 bytes in 0.105 second response time

OAI-PMH-59 UP

dans.knaw.nl

OAI-PMH OK: valid XML

OAI-PMH-54 UP

textgridrep.org

OAI-PMH OK: valid XML

TF-IDF UP

ws.clarin-pl.eu

HTTP OK: HTTP/1.1 200 OK - 658 bytes in 0.221 second response time

OAI-PMH-20 UP

clarin.dk

OAI-PMH OK: valid XML

HTTP UP

clarin.eu

HTTP OK: HTTP/1.1 200 OK - 3032 bytes in 0.066 second response time

SRU/CQL-14 UP

www.ids-mannheim.de

SRU/CQL OK: valid XML

SRU/CQL-25 UP

www.bbaw.de

SRU/CQL OK: valid XML

SRU/CQL-13 UP

www.ids-mannheim.de

SRU/CQL OK: valid XML

HTTP UP

fedora.dwds.de

HTTP OK: HTTP/1.1 301 Moved Permanently - 534 bytes in 0.078 second response time

OAI-PMH-41 UP

lac.uni-koeln.de

OAI-PMH OK: valid XML

OAI-PMH-6 UP

fedora.clarin-d.uni-saarland.de

OAI-PMH OK: valid XML

HTTPS UP

clarin.eu

HTTP OK: HTTP/1.1 200 OK - 3032 bytes in 0.142 second response time

UDPipe UP

ufal.mff.cuni.cz

HTTP OK: HTTP/1.1 200 OK - 36440 bytes in 0.066 second response time

Serel UP

ws.clarin-pl.eu

HTTP OK: HTTP/1.1 200 OK - 658 bytes in 0.117 second response time

WebLicht NamedEntities EN UP

weblicht.sfs.uni-tuebingen.de

HTTP OK: HTTP/1.1 302 Found - 1040 bytes in 0.051 second response time

SRU/CQL-8 UP

www.bbaw.de

SRU/CQL OK: valid XML

OAI-PMH-34 UP

spraakbanken.gu.se

OAI-PMH OK: valid XML

Spejd UP

ws.clarin-pl.eu

HTTP OK: HTTP/1.1 200 OK - 658 bytes in 0.114 second response time

OAI-PMH-60 UP

www.elararchive.org

OAI-PMH OK: valid XML

SRU/CQL-40 WARNING

www.sfs.uni-tuebingen.de

SRU/CQL WARNING: XSD validation failed :2:0:ERROR:SCHEMASV:SCHEMAV_CVC_ELT_1: Element '{http://www.openarchives.org/OAI/2.0/}OAI-PMH': No matching global declaration available for the validation root.

CMDI Explorer UP

weblicht.sfs.uni-tuebingen.de

HTTP OK: HTTP/1.1 200 OK - 1880 bytes in 0.091 second response time

Handle retrieve /10932/00-017B-E190-A83E-6F01-5 UP

clarin.ids-mannheim.de

HTTP OK: HTTP/1.1 302 - 546 bytes in 0.490 second response time

Concraft -> Bartek UP

zil.ipipan.waw.pl

HTTP OK: HTTP/1.1 200 OK - 8217 bytes in 0.081 second response time

CLARIN DS status proxy [prod] UP

ws1-clarind.esc.rzg.mpg.de

HTTP OK: HTTP/1.1 200 OK - 3032 bytes in 0.143 second response time

SRU/CQL-28 UP

spraakbanken.gu.se

SRU/CQL OK: valid XML

TeLeMaCo UP

HTTP OK: HTTP/1.1 200 OK - 6975 bytes in 0.119 second response time

Summarize UP

ws.clarin-pl.eu

HTTP OK: HTTP/1.1 200 OK - 658 bytes in 0.116 second response time

Web Server UP

HTTP OK: HTTP/1.1 200 OK - 9678 bytes in 0.061 second response time

OAI-PMH-13 UP

www.meertens.knaw.nl

OAI-PMH OK: valid XML

WebLicht Dep Parsing DE UP

weblicht.sfs.uni-tuebingen.de

HTTP OK: HTTP/1.1 302 Found - 1040 bytes in 0.051 second response time

WebLicht POSTags Lemmas IT UP

weblicht.sfs.uni-tuebingen.de

HTTP OK: HTTP/1.1 302 Found - 1040 bytes in 0.073 second response time

OAI-PMH-39 UP

www.tekstlab.uio.no

OAI-PMH OK: valid XML

Tagger NLTK UP

ws.clarin-pl.eu

HTTP OK: HTTP/1.1 200 OK - 658 bytes in 0.113 second response time

OAI-PMH-3 UP

hdl.handle.net

OAI-PMH OK: valid XML

SRU/CQL-7 UP

www.bbaw.de

SRU/CQL OK: valid XML

HTTPS UP

134.94.199.148

HTTP OK: HTTP/1.1 302 Found - 1077 bytes in 0.032 second response time

OAI-PMH-49 UP

zim.uni-graz.at

OAI-PMH OK: valid XML

OAI-PMH-51 UP

www.humlab.lu.se

OAI-PMH OK: valid XML

SRU/CQL-1 UP

www.sfs.uni-tuebingen.de

SRU/CQL OK: valid XML

WebLicht Morphology EN UP

weblicht.sfs.uni-tuebingen.de

HTTP OK: HTTP/1.1 302 Found - 1040 bytes in 0.051 second response time

OAI-PMH-47 UP

worldviews.gei.de

OAI-PMH OK: valid XML

OAI-PMH-27 UP

www.clarin.si

OAI-PMH OK: valid XML

CLARIN Centre Registry [UI][prod] UP

centres.clarin.eu

HTTP OK: HTTP/1.1 200 OK - 3032 bytes in 0.155 second response time

WoSeDon UP

ws.clarin-pl.eu

HTTP OK: HTTP/1.1 200 OK - 658 bytes in 0.116 second response time

SRU/CQL-31 UP

www.bbaw.de

SRU/CQL OK: valid XML

OAI-PMH-OAI-PMH provider UP

OAI-PMH OK: valid XML

Sonatype Nexus UP

nexus.clarin.eu

HTTP OK: HTTP/1.1 200 OK - 3032 bytes in 0.117 second response time

OAI-PMH-8 UP

www.corpora.uni-hamburg.de

OAI-PMH OK: valid XML

SRU/CQL-42 UP

www.ortolang.fr

SRU/CQL OK: valid XML

OAI-PMH-5 UP

archive.mpi.nl

OAI-PMH OK: valid XML

OAI-PMH-11 UP

lindat.cz

OAI-PMH OK: valid XML

OAI-PMH-17 UP

www.kielipankki.fi

OAI-PMH OK: valid XML

SRU/CQL-10 UP

www.corpora.uni-hamburg.de

SRU/CQL OK: valid XML

OAI-PMH-35 UP

www.polmine.de

OAI-PMH OK: valid XML

OAI-PMH-33 UP

www.clarin-lt.lt

OAI-PMH OK: valid XML

OAI-PMH-9 UP

www.ims.uni-stuttgart.de

OAI-PMH OK: valid XML

SRU/CQL-27 UP

fedora.clarin-d.uni-saarland.de

SRU/CQL OK: valid XML

WebLicht NamedEntities SL UP

weblicht.sfs.uni-tuebingen.de

HTTP OK: HTTP/1.1 302 Found - 1040 bytes in 0.087 second response time

Data from monitoring.clarin.eu

List of analysis tools for processing research data accessible via CLARIAH-DE partners

Services available through the CLARIAH-DE tools are extremely diverse and are used in different scenarios. Their use always depends on the specific research context. Therefore, the individual components are shown here with a short description. The list will be continuously updated. In the research context, some services are also accessible via the other application-related menu items. The tools are provided by CLARIAH-DE partners and related institutions.
Alpino (Plaintext document (untokenised)) Annotating: Dutch
Stateproduction

Alpino is a dependency parser for Dutch, developed in the context of the PIONIER Project Algorithms for Linguistic Processing, developed by Gertjan van Noord at the University of Groningen. You can upload either tokenised or untokenised files (which will be automatically tokenised for you using ucto), the output will consist of a zip file containing XML files, one for each sentence in the input document.

short description

documentation

Description of the target group and its size

Scholars doing automatic analysis of texts

licences

GNU Lesser General Public License v2.1

formats and languages

Input
  • languagesDutch
  • text/plainplain text file
Output
  • application/zipzip archive
  • text/plainplain text file
  • text/xmlXML file

application type

Web-UI

Datenblatt (Fact sheet)

contact

version

application category

Research Software

application subcategory

Dependency Parsing

research activity

Annotating, Parsing

data communication encryption

unknown

privacy policy

unknown

authentication

Unknown

authentication requirements

yes. Before tool use, Please register at https://webservices-lst.science.ru.nl/register/

Creators

Gertjan van Noord (Rijksuniversieit Groningen)

hoster

Rijksuniversieit Groningen

usage restrictions for individual users

academic usage only

countries supported

all
Alpino (Plaintext tokenised input, one sentence per line) Annotating: Dutch
Stateproduction

Alpino is a dependency parser for Dutch, developed in the context of the PIONIER Project Algorithms for Linguistic Processing, developed by Gertjan van Noord at the University of Groningen. You can upload either tokenised or untokenised files (which will be automatically tokenised for you using ucto), the output will consist of a zip file containing XML files, one for each sentence in the input document.

short description

documentation

Description of the target group and its size

Scholars doing automatic analysis of texts

licences

GNU Lesser General Public License v2.1

formats and languages

Input
  • languagesDutch
  • text/plainplain text file
Output
  • application/zipzip archive
  • text/plainplain text file
  • text/xmlXML file

application type

Web-UI

Datenblatt (Fact sheet)

contact

version

application category

Research Software

application subcategory

Dependency Parsing

research activity

Annotating, Parsing

data communication encryption

unknown

privacy policy

unknown

authentication

Unknown

authentication requirements

yes. Before tool use, Please register at https://webservices-lst.science.ru.nl/register/

Creators

Gertjan van Noord (Rijksuniversieit Groningen)

hoster

Rijksuniversieit Groningen

usage restrictions for individual users

academic usage only

countries supported

all
Alpino Annotating: Dutch
Statedevelopment

Alpino is a dependency parser for Dutch, developed in the context of the PIONIER Project Algorithms for Linguistic Processing, developed by Gertjan van Noord at the University of Groningen. You can upload either tokenised or untokenised files (which will be automatically tokenised for you using ucto), the output will consist of a zip file containing XML files, one for each sentence in the input document.

short description

documentation

Description of the target group and its size

Scholars doing automatic analysis of texts

licences

LGPL 2.1

formats and languages

Input
  • languagesDutch
  • text/plainplain text file
Output
  • alpinooutput
  • text/folia+xmlFormat for Linguistic Annotation (FoLiA) file
  • tokoutput

application type

Web-UI

Datenblatt (Fact sheet)

contact

version

application category

Research Software

application subcategory

Dependency Parsing

research activity

Annotating, Parsing

data communication encryption

unknown

privacy policy

unknown

authentication

Proprietary

authentication requirements

Yes. Before tool use, please register at https://webservices-lst.science.ru.nl/register.

Creators

Gertjan van Noord (University of Groningen), Maarten van Gompel (webservice only, CLST, Radboud University Nijmegen)

hoster

Nijmegen, The Netherlands (CLAM Webservices)

usage restrictions for individual users

academic usage only

countries supported

all
Apache Stanbol Enhancer Enriching: English
Statedevelopment

Apache Stanbol provides a set of reusable components for semantic content management. A number of EnhancementEngines extract features from passed content, for details see https://stanbol.apache.org. The resulting RDF enhancements are returned in JSON format.

short description

documentation

Description of the target group and its size

Scholars doing automatic analysis of texts

formats and languages

Input
  • languagesEnglish
  • text/plainplain text file
Output
  • application/jsonJSON data

application type

Web-UI

Datenblatt (Fact sheet)

contact

version

v1.0

application category

Research Software

application subcategory

Text Enhancement

research activity

Enriching

data communication encryption

unknown

privacy policy

unknown

authentication

no authentication

Creators

Apache Foundation (software), Austrian Centre of Digital Humanities (enhancement chains and configuration)

hoster

Vienna, Austria

usage restrictions for individual users

academic usage only

countries supported

all
Ariadne Visual Media Service Publishing
Statedevelopment

The Visual Media Service provides easy publication and presentation on the web of complex visual media assets. It is an automatic service that allows to upload visual media files on an server and to transform them into an efficient web format, making them ready for web-based visualization.

short description

documentation

Description of the target group and its size

Scholars doing automatic analysis of texts

formats and languages

Input
  • model/prs.ply

application type

Web-UI

Datenblatt (Fact sheet)

contact

version

application category

Research Software

application subcategory

Visualisation of 3D models

research activity

Publishing

data communication encryption

unknown

privacy policy

unknown

authentication

no authentication

Creators

Visual Computing Lab of CNR-ISTI

hoster

Pisa, Italy

usage restrictions for individual users

academic usage only

countries supported

all
Automatic Transcription of Dutch Speech Recordings (MP3 file) Speech Recognizing: Dutch
Stateproduction

This webservice uses automatic speech recognition to provide the transcriptions of recordings spoken in Dutch. You can upload and process only one file per project. For bulk processing and other questions, please contact Henk van den Heuvel at h.vandenheuvel@let.ru.nl.

short description

documentation

Description of the target group and its size

Scholars doing automatic analysis of texts

formats and languages

Input
  • languagesDutch
  • audio/mpeg
Output
  • text/plainplain text file
  • text/xmlXML file

application type

Web-UI

Datenblatt (Fact sheet)

contact

version

application category

Research Software

application subcategory

Speech Recognition

research activity

Speech Recognizing, Transcribing

data communication encryption

unknown

privacy policy

unknown

authentication

Unknown

authentication requirements

yes. Before tool use, Please register at https://webservices-lst.science.ru.nl/register/

Creators

Emre Yilmaz, Maarten van Gompel (Centre for Language and Speech Technology, Radboud University)

hoster

Centre for Language and Speech Technology, Radboud University

usage restrictions for individual users

academic usage only

countries supported

all
Automatic Transcription of Dutch Speech Recordings (Ogg file) Speech Recognizing: Dutch
Stateproduction

This webservice uses automatic speech recognition to provide the transcriptions of recordings spoken in Dutch. You can upload and process only one file per project. For bulk processing and other questions, please contact Henk van den Heuvel at h.vandenheuvel@let.ru.nl.

short description

documentation

Description of the target group and its size

Scholars doing automatic analysis of texts

formats and languages

Input
  • languagesDutch
  • audio/vorbis
Output
  • text/plainplain text file
  • text/xmlXML file

application type

Web-UI

Datenblatt (Fact sheet)

contact

version

application category

Research Software

application subcategory

Speech Recognition

research activity

Speech Recognizing, Transcribing

data communication encryption

unknown

privacy policy

unknown

authentication

Unknown

authentication requirements

yes. Before tool use, Please register at https://webservices-lst.science.ru.nl/register/

Creators

Emre Yilmaz, Maarten van Gompel (Centre for Language and Speech Technology, Radboud University)

hoster

Centre for Language and Speech Technology, Radboud University

usage restrictions for individual users

academic usage only

countries supported

all
Automatic Transcription of Dutch Speech Recordings (Wav file) Speech Recognizing: Dutch
Stateproduction

This webservice uses automatic speech recognition to provide the transcriptions of recordings spoken in Dutch. You can upload and process only one file per project. For bulk processing and other questions, please contact Henk van den Heuvel at h.vandenheuvel@let.ru.nl.

short description

documentation

Description of the target group and its size

Scholars doing automatic analysis of texts

formats and languages

Input
  • languagesDutch
  • audio/vnd.wave
Output
  • text/plainplain text file
  • text/xmlXML file

application type

Web-UI

Datenblatt (Fact sheet)

contact

version

application category

Research Software

application subcategory

Speech Recognition

research activity

Speech Recognizing, Transcribing

data communication encryption

unknown

privacy policy

unknown

authentication

Unknown

authentication requirements

yes. Before tool use, Please register at https://webservices-lst.science.ru.nl/register/

Creators

Emre Yilmaz, Maarten van Gompel (Centre for Language and Speech Technology, Radboud University)

hoster

Centre for Language and Speech Technology, Radboud University

usage restrictions for individual users

academic usage only

countries supported

all
CMDI Explorer Exploration
Stateproduction

The Explorer helps you explore CMDI metadata and process the resources they describe.

short description

documentation

Description of the target group and its size

Scholars doing automatic analysis of texts

formats and languages

Input
  • application/x-cmdi+xml
Output
  • application/zipzip archive

application type

Web-UI

Datenblatt (Fact sheet)

contact

version

v1.0

application category

Research Software

application subcategory

Metadata Processing

research activity

Exploration

data communication encryption

unknown

privacy policy

unknown

authentication

no authentication

Creators

CLARIN-D Centre at the University of Tuebingen, Germany

hoster

Tuebingen, Germany

usage restrictions for individual users

academic usage only

countries supported

all
Colibri Core (FoLiA XML document) Analyzing
Stateproduction

A tool for pattern extraction and analysis on corpus data.

short description

documentation

Description of the target group and its size

Scholars doing automatic analysis of texts

licences

GNU General Public License v3

formats and languages

Input
  • text/xmlXML file
Output
  • application/octet-streamarbitrary binary data
  • text/csvtabular data, comma-separated values
  • text/plainplain text file

application type

Web-UI

Datenblatt (Fact sheet)

contact

version

application category

Research Software

application subcategory

N-Gramming

research activity

Analyzing, Pattern Recognition

data communication encryption

unknown

privacy policy

unknown

authentication

Unknown

authentication requirements

yes. Before tool use, Please register at https://webservices-lst.science.ru.nl/register/

Creators

Maarten van Gompel (Centre for Language and Speech Technology, Radboud University)

hoster

Centre for Language and Speech Technology, Radboud University

usage restrictions for individual users

academic usage only

countries supported

all
Colibri Core (folia+xml) Analyzing: German, English, French…
Statedevelopment

Colibri core is an NLP tool as well as a C++ and Python library for working with basic linguistic constructions such as n-grams and skipgrams (i.e patte rns with one or more gaps, either of fixed or dynamic size) in a quick and memory-efficient way.

short description

documentation

Description of the target group and its size

Scholars doing automatic analysis of texts

formats and languages

Input
  • languagesDutch, English, German, French, Spanish, Portuguese, Western Frisian
  • text/folia+xmlFormat for Linguistic Annotation (FoLiA) file
Output
  • Tadpole Columned Output Format
  • text/folia+xmlFormat for Linguistic Annotation (FoLiA) file

application type

Web-UI

Datenblatt (Fact sheet)

contact

version

application category

Research Software

application subcategory

N-Gramming

research activity

Analyzing

data communication encryption

unknown

privacy policy

unknown

authentication

Proprietary

authentication requirements

Yes. Before tool use, please register at https://webservices-lst.science.ru.nl/register.

Creators

Maarten van Gompel, Ko van der Sloot (CLST, Radboud University Nijmegen)

hoster

Nijmegen, The Netherlands (CLAM Webservices)

usage restrictions for individual users

academic usage only

countries supported

all
Colibri Core (Plain text input (tokenised)) Analyzing
Stateproduction

A tool for pattern extraction and analysis on corpus data.

short description

documentation

Description of the target group and its size

Scholars doing automatic analysis of texts

licences

GNU General Public License v3

formats and languages

Input
  • text/plainplain text file
Output
  • application/octet-streamarbitrary binary data
  • text/csvtabular data, comma-separated values
  • text/plainplain text file

application type

Web-UI

Datenblatt (Fact sheet)

contact

version

application category

Research Software

application subcategory

N-Gramming

research activity

Analyzing, Pattern Recognition

data communication encryption

unknown

privacy policy

unknown

authentication

Unknown

authentication requirements

yes. Before tool use, Please register at https://webservices-lst.science.ru.nl/register/

Creators

Maarten van Gompel (Centre for Language and Speech Technology, Radboud University)

hoster

Centre for Language and Speech Technology, Radboud University

usage restrictions for individual users

academic usage only

countries supported

all
Colibri Core (Plain text input (untokenised)) Analyzing: German, English, French…
Stateproduction

A tool for pattern extraction and analysis on corpus data.

short description

documentation

Description of the target group and its size

Scholars doing automatic analysis of texts

licences

GNU General Public License v3

formats and languages

Input
  • languagesEnglish, Dutch, German, French, Spanish, Portuguese, Western Frisian
  • text/plainplain text file
Output
  • application/octet-streamarbitrary binary data
  • text/csvtabular data, comma-separated values
  • text/plainplain text file

application type

Web-UI

Datenblatt (Fact sheet)

contact

version

application category

Research Software

application subcategory

N-Gramming

research activity

Analyzing, Pattern Recognition

data communication encryption

unknown

privacy policy

unknown

authentication

Unknown

authentication requirements

yes. Before tool use, Please register at https://webservices-lst.science.ru.nl/register/

Creators

Maarten van Gompel (Centre for Language and Speech Technology, Radboud University)

hoster

Centre for Language and Speech Technology, Radboud University

usage restrictions for individual users

academic usage only

countries supported

all
Colibri Core (plain text) Analyzing: German, English, French…
Statedevelopment

Colibri core is an NLP tool as well as a C++ and Python library for working with basic linguistic constructions such as n-grams and skipgrams (i.e patterns with one or more gaps, either of fixed or dynamic size) in a quick and memory-efficient way.

short description

documentation

Description of the target group and its size

Scholars doing automatic analysis of texts

licences

GNU General Public License v3

formats and languages

Input
  • languagesDutch, English, German, French, Spanish, Portuguese, Western Frisian
  • text/plainplain text file
Output
  • Tadpole Columned Output Format
  • text/folia+xmlFormat for Linguistic Annotation (FoLiA) file

application type

Web-UI

Datenblatt (Fact sheet)

contact

version

application category

Research Software

application subcategory

N-Gramming

research activity

Analyzing, Pattern Recognition

data communication encryption

unknown

privacy policy

unknown

authentication

Proprietary

authentication requirements

Yes. Before tool use, please register at https://webservices-lst.science.ru.nl/register.

Creators

Maarten van Gompel (CLST, Radboud University Nijmegen)

hoster

Nijmegen, The Netherlands (CLAM Webservices)

usage restrictions for individual users

academic usage only

countries supported

all
Collection Registry Gathering: Deutsch
Stateproduction
key wordsCollection Catalog

The Collection Registry - serves as a catalog of collections which occurred within the scope of research projects or serves as a basis for them. - links data, whose data models and the description of a collection for technical reuse by services such as search or analysis tools. - also serves to manage collection descriptions. These can include, in addition to digitally accessible, analog, protected or offline collections.

		The purpose of the Collection Registry is
		- to describe distributed collections in one place and to process them together in other services (e.g. Generic search, Cosmotool).
		- to make collections visible in the Collection Registry, which are otherwise difficult to find.
		- to document own collections and make them demonstrable for other scientists.
		- in order to be able to manage relevant collections in the sense of an internal catalog.

short description

documentation

Description of the target group and its size

Scholars who want to register and catalog collections.

formats and languages

Input
  • languagesDeutsch
  • application/xmlXTML file, Schema
Output
  • json, application/xml

application type

Web-UI

Datenblatt (Fact sheet)

contact

version

application category

(not applicable)

application subcategory

Gathering

research activity

Gathering

data communication encryption

other

privacy policy

other

authentication

DARIAH Identity Provider

hoster

Fakultät Wirtschaftsinformatik und Angewandte Informatik, Lehrstuhl für Medieninformatik, Universität Bamberg, Bamberg, Germany

usage restrictions for individual users

academic usage only

countries supported

all
Concraft -> Bartek -> NicolasSummarizer Analyzing: Polish
Stateproduction

Java coreference-based summarization tool; its creation was cofunded by the European Union from resources of the European Social Fund -- Project PO KL 'Information technologies: Research and their interdisciplinary applications'. Part of: Multiservice, a robust linguistic Web service for Polish.

short description

documentation

Description of the target group and its size

Scholars doing automatic analysis of texts

formats and languages

Input
  • languagesPolish
  • text/plainplain text file
  • text/htmlHTML file
Output
  • application/jsonJSON data
  • CoNLL format
  • Visualization

application type

Web-UI

Datenblatt (Fact sheet)

contact

version

v1.0

application category

Research Software

application subcategory

Text Summarization

research activity

Analyzing

data communication encryption

unknown

privacy policy

unknown

authentication

no authentication

Creators

Institute of Computer Science, Polish Academy of Sciences, Poland

hoster

Warsaw, Poland

usage restrictions for individual users

academic usage only

countries supported

all
Concraft -> Bartek Analyzing: Polish
Stateproduction

A statistical tool chain for performing Coreference Resolution. Part of: Multiservice, a robust linguistic Web service for Polish.

short description

documentation

Description of the target group and its size

Scholars doing automatic analysis of texts

formats and languages

Input
  • languagesPolish
  • text/plainplain text file
  • text/htmlHTML file
Output
  • application/jsonJSON data
  • CoNLL format
  • Visualization

application type

Web-UI

Datenblatt (Fact sheet)

contact

version

v1.0

application category

Research Software

application subcategory

Coreference Resolution

research activity

Analyzing

data communication encryption

unknown

privacy policy

unknown

authentication

no authentication

Creators

Institute of Computer Science, Polish Academy of Sciences, Poland

hoster

Warsaw, Poland

usage restrictions for individual users

academic usage only

countries supported

all
Concraft -> DependencyParser Annotating: Polish
Stateproduction

The Polish dependency parser is trained on the extended version of the Polish dependency treebank (Składnica zależnościowa) with the publicly available parsing systems – MaltParser or MateParser. MaltParser is a transition-based dependency parser that uses a deterministic parsing algorithm. The deterministic parsing algorithm builds a dependency structure of an input sentence based on transitions (shift-reduce actions) predicted by a classifier. The classifier learns to predict the next transition given training data and the parse history. MateParser, in turn, is a graph-based parser that defines a space of well-formed candidate dependency trees for an input sentence, scores them given an induced parsing model, and selects the highest scoring dependency tree as a correct analysis of the input sentence. Part of: Multiservice, a robust linguistic Web service for Polish.

short description

documentation

Description of the target group and its size

Scholars doing automatic analysis of texts

formats and languages

Input
  • languagesPolish
  • text/plainplain text file
  • text/htmlHTML file
Output
  • application/jsonJSON data
  • CoNLL format
  • Visualization

application type

Web-UI

Datenblatt (Fact sheet)

contact

version

v1.0

application category

Research Software

application subcategory

Dependency Parsing

research activity

Annotating, Parsing

data communication encryption

unknown

privacy policy

unknown

authentication

no authentication

Creators

Institute of Computer Science, Polish Academy of Sciences, Poland

hoster

Warsaw, Poland

usage restrictions for individual users

academic usage only

countries supported

all
Concraft -> Nerf Analyzing: Polish
Stateproduction

Statistical named entity recognition tool based on linear-chain conditional random fields. Part of: Multiservice, a robust linguistic Web service for Polish.

short description

documentation

Description of the target group and its size

Scholars doing automatic analysis of texts

formats and languages

Input
  • languagesPolish
  • text/plainplain text file
  • text/htmlHTML file
Output
  • application/jsonJSON data
  • CoNLL format
  • Visualization

application type

Web-UI

Datenblatt (Fact sheet)

contact

version

v1.0

application category

Research Software

application subcategory

Named Entity Recognition

research activity

Analyzing, Named Entity Recognition

data communication encryption

unknown

privacy policy

unknown

authentication

no authentication

Creators

Institute of Computer Science, Polish Academy of Sciences, Poland

hoster

Warsaw, Poland

usage restrictions for individual users

academic usage only

countries supported

all
Concraft -> Sentipejd Analyzing: Polish
Stateproduction

A morphosyntactic tagger extended with a semantic category, expressing properties of positive or negative sentiment. Part of: Multiservice, a robust linguistic Web service for Polish.

short description

documentation

Description of the target group and its size

Scholars doing automatic analysis of texts

formats and languages

Input
  • languagesPolish
  • text/plainplain text file
  • text/htmlHTML file
Output
  • application/jsonJSON data
  • CoNLL format
  • Visualization

application type

Web-UI

Datenblatt (Fact sheet)

contact

version

v1.0

application category

Research Software

application subcategory

Sentiment Analysis

research activity

Analyzing, Sentiment Analysis

data communication encryption

unknown

privacy policy

unknown

authentication

no authentication

Creators

Institute of Computer Science, Polish Academy of Sciences, Poland

hoster

Warsaw, Poland

usage restrictions for individual users

academic usage only

countries supported

all
Concraft Analyzing: Polish
Stateproduction

Morphosyntactic tagger for Polish based on constrained conditional random fields. Part of: Multiservice, a robust linguistic Web service for Polish.

short description

documentation

Description of the target group and its size

Scholars doing automatic analysis of texts

formats and languages

Input
  • languagesPolish
  • text/plainplain text file
  • text/htmlHTML file
Output
  • application/jsonJSON data
  • CoNLL format
  • Visualization

application type

Web-UI

Datenblatt (Fact sheet)

contact

version

v1.0

application category

Research Software

application subcategory

Part-Of-Speech Tagging

research activity

Analyzing, POS-Tagging

data communication encryption

unknown

privacy policy

unknown

authentication

no authentication

Creators

Institute of Computer Science, Polish Academy of Sciences, Poland

hoster

Warsaw, Poland

usage restrictions for individual users

academic usage only

countries supported

all
Concraft->Spejd Annotating: Polish
Stateproduction

Tool for partial parsing and rule-based morphosyntactic disambiguation. Part of: Multiservice, a robust linguistic Web service for Polish.

short description

documentation

Description of the target group and its size

Scholars doing automatic analysis of texts

formats and languages

Input
  • languagesPolish
  • text/plainplain text file
  • text/htmlHTML file
Output
  • application/jsonJSON data
  • CoNLL format
  • Visualization

application type

Web-UI

Datenblatt (Fact sheet)

contact

version

v1.0

application category

Research Software

application subcategory

Shallow Parsing

research activity

Annotating, Parsing

data communication encryption

unknown

privacy policy

unknown

authentication

no authentication

Creators

Institute of Computer Science, Polish Academy of Sciences, Poland

hoster

Warsaw, Poland

usage restrictions for individual users

academic usage only

countries supported

all
ConedaKOR Archiving
Stateproduction
key wordsWeb-based database system, Graph-based architecture

ConedaKOR facilitate the administration and presentation of academic collections of objects from the image-based cultural sciences and humanities. It allows to store arbitrary documents and interconnect them with relationships. You can build huge semantic networks for an unlimited amount of domains. ConedaKOR integrates a sophisticated ontology management tool with an easy-to-use media database.

short description

documentation

Description of the target group and its size

The purpose of ConedaKOR is the administration and presentation of academic object collections from image-based cultural and human sciences.

formats and languages

application type

Web-UI

network and security requirements

  • memory required4GB
  • processor2

developer documentation

Datenblatt (Fact sheet)

contact

version

application category

(not applicable)

application subcategory

Archiving

research activity

Archiving, Publishing

source code available

authentication

Creators

hoster

DAASI, Tübingen, Germany

part of an application suite

ConedaKOR

usage restrictions for individual users

academic usage only

countries supported

all
Content Search Searching
Stateproduction

The CLARIN Content Search is a simple service that enables researchers to search for specific patterns across collections of data. The service is powered by a search engine that connects to the local data collections that are available in the centres. The data itself stays at the centre where it is hosted – therefore the underlying technique is called federated content search. The service summarizes and displays what is available. An easy next step is to go to the centre's specialised search interface to perform a more sophisticated query.

short description

documentation

Description of the target group and its size

Scholars doing automatic analysis of texts

formats and languages

Input
  • languagesgeneric
  • text/plainplain text file
Output
  • text/plainplain text file

application type

Web-UI

Datenblatt (Fact sheet)

contact

version

application category

Research Software

application subcategory

Dictionary

research activity

Searching

data communication encryption

unknown

privacy policy

unknown

authentication

no authentication

Creators

https://contentsearch.clarin.eu/ws/fcs/2.0/aggregator/about

hoster

Utrecht, Netherlands

usage restrictions for individual users

academic usage only

countries supported

all
COSMAS II Searching
Statetesting

COSMAS II is a database (Corpus Search, Management and Analysis System) designed at the IDS for corpus-based research on language

            - in extensive corpora (over 13 billion word forms, provided by the DEREKO project);
            - in linguistically and structurally annotated corpora; e.g. word classes (over 1.7 billion nouns), headings etc;
            - in user-defined corpus selections (based on up to eight bibliographic criteria);
            - in different language corpora with custom tag sets, useing an embedded graphical wizard;
            - using numerous search, distance and range operators that allow to formulate simple to complex facts or grammatical patterns.
            
            The results are 
            
            - summarized and sorted according to bibliographical criteria;
            - evaluated by frequency measures in terms of their distribution;
            - analysed, sorted and tabulated using a co-competition analysis;
            - sorted, analysed and presented as KWIC and supporting documents;
            - (if desired) reduced to a representative, manageable quantity by means of a random generator.

short description

documentation

Description of the target group and its size

Scholars of modern German linguistics

formats and languages

Input
  • text/plain+cosmas2Cosmas II Anfrage
Output
  • application/rtf
  • text/plainplain text file

Localization

German

application type

Web-UI

developer documentation

Datenblatt (Fact sheet)

contact

version

application category

Research Software

application subcategory

Searching

research activity

Searching

data communication encryption

other

privacy policy

other

authentication

Proprietary

Creators

  • Franck Bodmer Mory (Developer) [GND]
  • Helge Stallkamp (Developer) [GND]

hoster

Leibniz-Institut für Deutsche Sprache, Mannheim, Germany

usage restrictions for individual users

CosmoTool Spatial Analysis: Deutsch
Statedevelopment
key wordsBiographical information

CosmoTool is a digital tool that combines biographical information from different sources into inter- and national movement profiles of historical personalities. This is intended to draw conclusions on characteristics and rules, which can be regarded as international criteria. CosmoTool is based on DARIAH-DE federation architecture and allows the extraction of data from unstructured text. At the moment, CosmoTool is in the development phase and still offers limited functionality.

short description

documentation

Description of the target group and its size

Scholars who want to research historical personalities.

formats and languages

Input
  • languagesDeutsch
  • application/xmlXTML file
  • json
  • txt/csv
Output
  • json

application type

Web-UI

Datenblatt (Fact sheet)

contact

version

application category

Research Software

application subcategory

Named Entity Recognition

research activity

Spatial Analysis

data communication encryption

link

privacy policy

link

authentication

DFN Identity Provider

hoster

Fakultät Wirtschaftsinformatik und Angewandte Informatik, Lehrstuhl für Medieninformatik, Universität Bamberg, Bamberg, Germany

usage restrictions for individual users

academic usage only

countries supported

all
CSTLemma (hosted by D4Science) Analyzing: English
Stateproduction

This is an experimental integration of a D4Science NLP processing service (CSTLemma). The CSTLemma Lemmatizer for English reduces all words in a text to their base form, the lemma.

short description

documentation

Description of the target group and its size

Scholars doing automatic analysis of texts

formats and languages

Input
  • languagesEnglish
  • text/plainplain text file
Output
  • text/csvtabular data, comma-separated values

application type

Web-UI

Datenblatt (Fact sheet)

contact

version

v1.0

application category

Research Software

application subcategory

Lemmatization

research activity

Analyzing, Lemmatizing

data communication encryption

unknown

privacy policy

unknown

authentication

no authentication

Creators

Bart Jongejan (tool), D4Science staff (WAR upload)

hoster

Pisa, Italy

usage restrictions for individual users

academic usage only

countries supported

all
Cyril Belica: Kookkurrenzdatenbank CCDB Analyzing
Statetesting
key wordsCollocation Analysis, Database

In a corpus-based empirical linguistic approach, it is of fundamental importance to conceive a methodology that is coherent in terms of scientific methodology and that makes it possible to systematically uncover, inventory, interpret and theoretically substantiate the emergent structures manifest in language use. As an empirical basis for this research project, a large collection of co-occurrence profiles for about 220,000 different lemmas was built up in the Programme Area Corpus Linguistics of the Leibniz Institute for the German Language based on a corpus of written contemporary language of about 2.2 billion running text words. For each lemma, the collection contains the results of up to five different co-occurrence analyses in the form of hierarchies of similar uses, with up to 100,000 examples of use per lemma and analysis.

            Guided by the explorative analysis of this language material, we strive to gain new insights into the structures, regularities, properties and functions of language. Currently we focus on topics such as similarity of coccurrence profiles and semantic proximity, on the interrelationships between local, lexical and global, situational contexts, and on various studies on quasi-synonymy.
            
            Through this website we would like to make parts of our thinking and experimenting platform in the sense of a "transparent laboratory" accessible to all interested colleagues.

short description

documentation

Description of the target group and its size

Scholars of modern German linguistics

formats and languages

Input
  • text/plain; format-variant=ccdbCCDB Anfrage
Output
  • image/svg+xml
  • image/x-wmf
  • text/htmlHTML file

Localization

German

application type

Web-UI

developer documentation

Datenblatt (Fact sheet)

contact

version

application category

Research Software

application subcategory

Collocation Analysis

research activity

Analyzing

data communication encryption

other

privacy policy

other

authentication

Proprietary

Creators

Cyril Belica (Developer)

hoster

Leibniz-Institut für Deutsche Sprache, Mannheim, Germany

usage restrictions for individual users

D4Science NER (GATE's Annie) Analyzing: English
Statedevelopment

This is an experimental integration of a D4Science NLP processing service (based on GATE's ANNIE). This service identifies names of persons, locations, organizations, as well as money amounts, time and date expressions in English texts automatically.

short description

documentation

Description of the target group and its size

Scholars doing automatic analysis of texts

formats and languages

Input
  • languagesEnglish
  • text/plainplain text file
Output
  • application/xmlXTML file

application type

Web-UI

Datenblatt (Fact sheet)

contact

version

v1.0

application category

Research Software

application subcategory

Named Entity Recognition

research activity

Analyzing, Named Entity Recognition

data communication encryption

unknown

privacy policy

unknown

authentication

no authentication

Creators

D4Science staff

hoster

Pisa, Italy

usage restrictions for individual users

academic usage only

countries supported

all
DARIAH DKPro-Wrapper: Constituency Parsing DE Annotating: German
Stateproduction
key wordsParsing

The DARIAH DKPro Wrapper is a wrapper for DKPro Core, a tool for linguistic annotation.

short description

documentation

Description of the target group and its size

Scholars doing automatic analysis of texts

formats and languages

Input
  • languagesGerman
  • text/plainplain text file
  • text/xmlXML file
Output
  • text/csvtabular data, comma-separated values

application type

Desktop cross-platform support

network and security requirements

  • memory required4GB
  • runtimeEnvironmentJava 1.8 or higher, 64bit

developer documentation

Datenblatt (Fact sheet)

contact

version

application category

Research Software

application subcategory

Constituency Parsing

research activity

Annotating, Parsing

source code available

authentication

Creators

hoster

part of an application suite

DARIAH-DE DKPro-Wrapper

usage restrictions for individual users

academic usage only

countries supported

all
DARIAH DKPro-Wrapper: Constituency Parsing EN Annotating: English
Stateproduction
key wordsParsing

The DARIAH DKPro Wrapper is a wrapper for DKPro Core, a tool for linguistic annotation.

short description

documentation

Description of the target group and its size

Scholars doing automatic analysis of texts

formats and languages

Input
  • languagesEnglish
  • text/plainplain text file
  • text/xmlXML file
Output
  • text/csvtabular data, comma-separated values

application type

Desktop cross-platform support

network and security requirements

  • memory required4GB
  • runtimeEnvironmentJava 1.8 or higher, 64bit

developer documentation

Datenblatt (Fact sheet)

contact

version

application category

Research Software

application subcategory

Constituency Parsing

research activity

Annotating, Parsing

source code available

authentication

Creators

hoster

part of an application suite

DARIAH-DE DKPro-Wrapper

usage restrictions for individual users

academic usage only

countries supported

all
DARIAH DKPro-Wrapper: Depency Parsing DE Annotating: German
Stateproduction
key wordsParsing

The DARIAH DKPro Wrapper is a wrapper for DKPro Core, a tool for linguistic annotation.

short description

documentation

Description of the target group and its size

Scholars doing automatic analysis of texts

formats and languages

Input
  • languagesGerman
  • text/plainplain text file
  • text/xmlXML file
Output
  • text/csvtabular data, comma-separated values

application type

Desktop cross-platform support

network and security requirements

  • memory required4GB
  • runtimeEnvironmentJava 1.8 or higher, 64bit

developer documentation

Datenblatt (Fact sheet)

contact

version

application category

Research Software

application subcategory

Dependency Parsing

research activity

Annotating, Parsing

source code available

authentication

Creators

hoster

part of an application suite

DARIAH-DE DKPro-Wrapper

usage restrictions for individual users

academic usage only

countries supported

all
DARIAH DKPro-Wrapper: Depency Parsing EN Annotating: English
Stateproduction
key wordsParsing

The DARIAH DKPro Wrapper is a wrapper for DKPro Core, a tool for linguistic annotation.

short description

documentation

Description of the target group and its size

Scholars doing automatic analysis of texts

formats and languages

Input
  • languagesEnglish
  • text/plainplain text file
  • text/xmlXML file
Output
  • text/csvtabular data, comma-separated values

application type

Desktop cross-platform support

network and security requirements

  • memory required4GB
  • runtimeEnvironmentJava 1.8 or higher, 64bit

developer documentation

Datenblatt (Fact sheet)

contact

version

application category

Research Software

application subcategory

Dependency Parsing

research activity

Annotating, Parsing

source code available

authentication

Creators

hoster

part of an application suite

DARIAH-DE DKPro-Wrapper

usage restrictions for individual users

academic usage only

countries supported

all
DARIAH DKPro-Wrapper: Hyphenation DE Analyzing: German
Stateproduction
key wordsCleanup

The DARIAH DKPro Wrapper is a wrapper for DKPro Core, a tool for linguistic annotation.

short description

documentation

Description of the target group and its size

Scholars doing automatic analysis of texts

formats and languages

Input
  • languagesGerman
  • text/plainplain text file
  • text/xmlXML file
Output
  • text/csvtabular data, comma-separated values

application type

Desktop cross-platform support

network and security requirements

  • memory required4GB
  • runtimeEnvironmentJava 1.8 or higher, 64bit

developer documentation

Datenblatt (Fact sheet)

contact

version

application category

Research Software

application subcategory

Cleanup

research activity

Analyzing

source code available

authentication

Creators

hoster

part of an application suite

DARIAH-DE DKPro-Wrapper

usage restrictions for individual users

academic usage only

countries supported

all
DARIAH DKPro-Wrapper: Hyphenation EN Analyzing: English
Stateproduction
key wordsCleanup

The DARIAH DKPro Wrapper is a wrapper for DKPro Core, a tool for linguistic annotation.

short description

documentation

Description of the target group and its size

Scholars doing automatic analysis of texts

formats and languages

Input
  • languagesEnglish
  • text/plainplain text file
  • text/xmlXML file
Output
  • text/csvtabular data, comma-separated values

application type

Desktop cross-platform support

network and security requirements

  • memory required4GB
  • runtimeEnvironmentJava 1.8 or higher, 64bit

developer documentation

Datenblatt (Fact sheet)

contact

version

application category

Research Software

application subcategory

Cleanup

research activity

Analyzing

source code available

authentication

Creators

hoster

part of an application suite

DARIAH-DE DKPro-Wrapper

usage restrictions for individual users

academic usage only

countries supported

all
DARIAH DKPro-Wrapper: Named Entity Recognition DE Annotating: German
Stateproduction
key wordsNER

The DARIAH DKPro Wrapper is a wrapper for DKPro Core, a tool for linguistic annotation.

short description

documentation

Description of the target group and its size

Scholars doing automatic analysis of texts

formats and languages

Input
  • languagesGerman
  • text/plainplain text file
  • text/xmlXML file
Output
  • text/csvtabular data, comma-separated values

application type

Desktop cross-platform support

network and security requirements

  • memory required4GB
  • runtimeEnvironmentJava 1.8 or higher, 64bit

developer documentation

Datenblatt (Fact sheet)

contact

version

application category

Research Software

application subcategory

Named Entity Recognition

research activity

Annotating

source code available

authentication

Creators

hoster

part of an application suite

DARIAH-DE DKPro-Wrapper

usage restrictions for individual users

academic usage only

countries supported

all
DARIAH DKPro-Wrapper: Named Entity Recognition EN Annotating: English
Stateproduction
key wordsNER

The DARIAH DKPro Wrapper is a wrapper for DKPro Core, a tool for linguistic annotation.

short description

documentation

Description of the target group and its size

Scholars doing automatic analysis of texts

formats and languages

Input
  • languagesEnglish
  • text/plainplain text file
  • text/xmlXML file
Output
  • text/csvtabular data, comma-separated values

application type

Desktop cross-platform support

network and security requirements

  • memory required4GB
  • runtimeEnvironmentJava 1.8 or higher, 64bit

developer documentation

Datenblatt (Fact sheet)

contact

version

application category

Research Software

application subcategory

Named Entity Recognition

research activity

Annotating

source code available

authentication

Creators

hoster

part of an application suite

DARIAH-DE DKPro-Wrapper

usage restrictions for individual users

academic usage only

countries supported

all
DARIAH DKPro-Wrapper: POS-Tagging und Lemmatization DE Annotating: German
Stateproduction
key wordsPOS, Lemma

The DARIAH DKPro Wrapper is a wrapper for DKPro Core, a tool for linguistic annotation.

short description

documentation

Description of the target group and its size

Scholars doing automatic analysis of texts

formats and languages

Input
  • languagesGerman
  • text/plainplain text file
  • text/xmlXML file
Output
  • text/csvtabular data, comma-separated values

application type

Desktop cross-platform support

network and security requirements

  • memory required4GB
  • runtimeEnvironmentJava 1.8 or higher, 64bit

developer documentation

Datenblatt (Fact sheet)

contact

version

application category

Research Software

application subcategory

Part-Of-Speech Tagging, Lemmatization

research activity

Annotating

source code available

authentication

Creators

hoster

part of an application suite

DARIAH-DE DKPro-Wrapper

usage restrictions for individual users

academic usage only

countries supported

all
DARIAH DKPro-Wrapper: POS-Tagging und Lemmatization EN Annotating: English
Stateproduction
key wordsPOS, Lemma

The DARIAH DKPro Wrapper is a wrapper for DKPro Core, a tool for linguistic annotation.

short description

documentation

Description of the target group and its size

Scholars doing automatic analysis of texts

formats and languages

Input
  • languagesEnglish
  • text/plainplain text file
  • text/xmlXML file
Output
  • text/csvtabular data, comma-separated values

application type

Desktop cross-platform support

network and security requirements

  • memory required4GB
  • runtimeEnvironmentJava 1.8 or higher, 64bit

developer documentation

Datenblatt (Fact sheet)

contact

version

application category

Research Software

application subcategory

Part-Of-Speech Tagging, Lemmatization

research activity

Annotating

source code available

authentication

Creators

hoster

part of an application suite

DARIAH-DE DKPro-Wrapper

usage restrictions for individual users

academic usage only

countries supported

all
DARIAH-DE Geo-Browser (CSV) Data Visualization
Stateproduction

The DARIAH-DE Geo-Browser allows a comparative visualization of several requests and facilitates the representation of data and their visualization in a correlation of geographic spatial relations at corresponding points of time and sequences. Thus, researchers can analyze space-time relations of data and collections of source material and simultaneously establish correlations between them.

short description

documentation

Description of the target group and its size

Scholars doing automatic analysis of texts

licences

LGPL-3.0-or-later

formats and languages

Input
  • text/csvtabular data, comma-separated values
  • text/comma-separated-value
  • application/vnd.dariahde.geobrowser.csv
Output
  • nonenone

application type

Web-UI

Datenblatt (Fact sheet)

contact

version

v3.6

application category

Research Software

application subcategory

Visualisation of Geographic Data

research activity

Data Visualization

data communication encryption

unknown

privacy policy

unknown

authentication

no authentication

Creators

DARIAH-DE

hoster

Göttingen, Germany

usage restrictions for individual users

academic usage only

countries supported

all
DARIAH-DE Geo-Browser (KML) Analyzing
Stateproduction

The DARIAH-DE Geo-Browser allows a comparative visualization of several requests and facilitates the representation of data and their visualization in a correlation of geographic spatial relations at corresponding points of time and sequences. Thus, researchers can analyze space-time relations of data and collections of source material and simultaneously establish correlations between them.

short description

documentation

Description of the target group and its size

Scholars doing automatic analysis of texts

licences

LGPL-3.0-or-later

formats and languages

Input
  • application/vnd.google-earth.kml+xml
  • application/vnd.google-earth.kmz
  • application/vnd.dariahde.geobrowser.kml+xml
Output
  • nonenone

application type

Web-UI

Datenblatt (Fact sheet)

contact

version

v3.6

application category

Research Software

application subcategory

Visualisation of Geographic Data

research activity

Analyzing, Data Visualization

data communication encryption

unknown

privacy policy

unknown

authentication

no authentication

Creators

DARIAH-DE

hoster

Göttingen, Germany

usage restrictions for individual users

academic usage only

countries supported

all
DARIAH-DE GeoBrowser Discovering: German, English
Stateproduction
key wordsVisualisation and exploration

The DARIAH-DE Geo-Browser allows a comparative visualization of several requests and facilitates the representation of data and their visualization in a correlation of geographic spatial relations at corresponding points of time and sequences.

short description

documentation

Description of the target group and its size

formats and languages

Input
  • languagesGerman, English
  • text/csvtabular data, comma-separated values
  • application/vnd.google-earth.kml+xml
  • application/vnd.google-earth.kmz

application type

Web-UI

Datenblatt (Fact sheet)

contact

version

application category

Research Software

application subcategory

Visualization

research activity

Discovering

authentication

Creators

hoster

  • SUB, Göttingen Germany
  • GWDG, Göttingen Germany

part of an application suite

GeoBrowser

usage restrictions for individual users

academic usage only

countries supported

all
DARIAH-DE Publikator Publishing
Stateproduction
key wordsAdministration of research collections

The DARIAH-DE Publikator offers the possibility to prepare, manage and import research data for the import into the DARIAH-DE Repository.

short description

documentation

Description of the target group and its size

Researchers who want to digitally store and archive human and cultural-scientific research data.

formats and languages

Localization

German, English

application type

Web-UI

developer documentation

Datenblatt (Fact sheet)

contact

version

application category

(not applicable)

application subcategory

Editing

research activity

Publishing

authentication

DFN Identity Provider

Creators

hoster

  • Göttingen State and University Library (SUB), Göttingen Germany
  • Gesellschaft für wissenschaftliche Datenverarbeitung mbH Göttingen (GWDG), Göttingen Germany

usage restrictions for individual users

public

countries supported

all
DARIAH-DE Repository Publishing
Stateproduction
key wordsResearch Repository, Long-term archiving

The entry point for importing collections and data into the DARIAH-DE Repository is the DARIAH-DE Publikator, which allows you to prepare, manage, and finally import your collections into the DARIAH-DE Repository using your favorite internet browser.

short description

documentation

Description of the target group and its size

Scholars in the humanities who would like to edit, store, and publish their data in a sustainable environment.

formats and languages

Output
  • application/xml+tei
  • text/plainplain text file
  • application/epub+zip
  • text/htmlHTML file
  • application/zipzip archive

application type

Web-UI

developer documentation

Datenblatt (Fact sheet)

contact

version

application category

(not applicable)

application subcategory

Archiving

research activity

Publishing, Archiving

authentication

DFN Identity Provider

Creators

DARIAH-DE Association, Responsibilities

hoster

  • Göttingen State and University Library (SUB), Göttingen Germany
  • Gesellschaft für wissenschaftliche Datenverarbeitung mbH Göttingen (GWDG), Göttingen Germany

usage restrictions for individual users

public

countries supported

all
Data Modelling Environment (DME) Modeling: Deutsch
Stateproduction
key wordsMetadata modelling and mapping

The Data Modeling Environment (DME) from DARIAH-DE is a tool for modeling and association of data. A key special feature of the DME is its research-oriented focus and the underlying concepts for the explication of domain knowledge.

short description

documentation

Description of the target group and its size

Scholars who want to modeling of data structures and relations between them.

licences

CC-BY

formats and languages

Input
  • languagesDeutsch
  • text/xmlXML file
  • text/json
  • text/csvtabular data, comma-separated values
  • text/plainplain text file
Output
  • text/xmlXML file
  • text/json
  • text/csvtabular data, comma-separated values
  • text/plainplain text file

application type

Web-UI

Datenblatt (Fact sheet)

contact

version

application category

(not applicable)

application subcategory

Modeling

research activity

Modeling

data communication encryption

link

privacy policy

link

authentication

DARIAH Identity Provider

hoster

Fakultät Wirtschaftsinformatik und Angewandte Informatik, Lehrstuhl für Medieninformatik, Universität Bamberg, Bamberg, Germany

usage restrictions for individual users

academic usage only

countries supported

all
Deutsches Textarchiv Archiving: Deutsch
Stateproduction

The German Text Archive (DTA) is the largest single corpus of historical New High German covering the period from the 16th to the early 20th century, comprising more than 350 million tokens in 1.34 million digitized pages. Focusing mostly on (digitized) printed material, the DTA also includes a growing number of hand-written documents. Specialty sub-corpora include historical newspapers and other periodicals. The DTA as a whole covers a rich variety of fiction and non-fiction texts, the latter including academic as well as non-academic writing.

The DTA is composed of the so-called DTA-Kernkorpus (DTAK, “DTA Core Corpus”) with approximately 1500 first editions from 

the 16th through the 19th century. Additionally, the DTA-Erweiterungen (DTAE, “DTA Extensions”) module contains specialty corpora and individual texts which have been curated in the context of CLARIN-D and other projects. The full-text sources provided by digitization projects and other discipline-specific initiatives have been (manually or semi-automatically) converted to a TEI-compatible XML format conforming to the DTA-Basisformat (DTABf, “DTA Base Format”) guidelines, including extensive metadata on the original sources and data preparation. OCR texts in the DTA Core Corpus – as well as numerous additional text resources – have been manually corrected. A continuous quality assurance process is made possible by the collaborative web-based platform DTAQ, with around 2000 currently registered users. All DTA corpora are prepared for user consumption by automated computational linguistic analysis methods, including not only PoS-tagging and lemmatization, but also – among others – the orthographic normalization of historical spelling variants, allowing users to formulate queries in modern orthography.

short description

documentation

Description of the target group and its size

The offer of the DTA as a text collection, publication and research platform is aimed at both users and producers of research data from all areas of text-based research, in particular including literary studies, German studies, general philologies, linguistics, computer linguistics, religious studies, church history, philosophy, educational research, historical studies, history of individual disciplines (e.g. medicine) and history of science in general.

formats and languages

Input
  • languagesDeutsch
  • application/xmlXTML file, Schema
  • application/xmlXTML file, Schema
Output
  • text/plainplain text file
  • application/xmlXTML file, Schema
  • application/xmlXTML file, Schema
  • application/xhtml+xmlXHTML file, Schema

Localization

German

application type

Web-UI

network and security requirements

  • operating systemLinux

Datenblatt (Fact sheet)

contact

  • technical contactwiegand@bbaw.de, Frank Wiegand (Developer)
  • subject matter contactAlexander Geyken (Arbeitsstellenleiter Digitales Wörterbuch der deutschen Sprache) [GND]

version

application category

Research Software

application subcategory

Archiving, Publishing

research activity

Archiving, Publishing

data communication encryption

unknown

privacy policy

unknown

authentication

Unknown

hoster

Berlin-Brandenburgische Akademie der Wissenschaften, Berlin, Germany

usage restrictions for individual users

public

countries supported

all
Deutsches Textarchiv – Qualitätssicherung Archiving: Deutsch
Stateproduction

Collaborative Quality Assurance in the German Text Archive (DTA) DTAQ (Deutsches Textarchiv - Qualitätssicherung) is a web-based application for finding, categorizing and correcting various types of errors in XML/TEI annotated texts. The interface of DTAQ can be individually adapted by each user, so that different views of the source digitized material and text transcriptions can be set.

DTAQ can be used freely by everyone after registration.

short description

documentation

Description of the target group and its size

The offer of the DTA (and thus also DTABf) as a text collection, publication and research platform is aimed at both users and producers of research data from all areas of text-based research, in particular including literary studies, German studies, general philologies, linguistics, computer linguistics, religious studies, church history, philosophy, educational research, historical studies, history of individual disciplines (e.g. medicine) and history of science in general. The DTAQ platform enables registered users to comment on all texts within the corpus infrastructure of the DTA and to report any remaining errors via a versioned ticket system. 'Expert users' with appropriate rights can edit the text resources with the help of a text editor and an XML editor and thus contribute in many ways to the optimisation and deeper exploration of the common corpus base.

formats and languages

Input
  • languagesDeutsch
  • application/xmlXTML file, Schema
  • application/xmlXTML file, Schema
Output
  • text/plainplain text file
  • application/xmlXTML file, Schema
  • application/xmlXTML file, Schema
  • application/xhtml+xmlXHTML file, Schema

Localization

German

application type

Web-UI

network and security requirements

  • operating systemLinux

Datenblatt (Fact sheet)

contact

  • technical contactwiegand@bbaw.de, Frank Wiegand (Developer)
  • subject matter contactAlexander Geyken (Arbeitsstellenleiter Digitales Wörterbuch der deutschen Sprache) [GND]

version

application category

Research Software

application subcategory

Archiving, Publishing

research activity

Archiving, Publishing

data communication encryption

unknown

privacy policy

unknown

authentication

Unknown

hoster

Berlin-Brandenburgische Akademie der Wissenschaften, Berlin, Germany

usage restrictions for individual users

public

countries supported

all
DGD – Datenbank für Gesprochenes Deutsch Analyzing
Stateproduction
key wordsCleanup

The DGD is the Database for Spoken German ("Datenbank für Gesprochenes Deutsch"). To use the DGD, you need to register (it's free). The DGD's user interface is in German. We are sorry we cannot provide a localized interface for other languages.

The DGD gives registered users access to 34 corpora of spoken language from the Archive for Spoken German ("Archiv für Gesprochenes Deutsch", AGD). The corpora comprise:

  • The Research and Teaching Corpus of Spoken German ("Forschungs und Lehrkorpus Gesprochenes Deutsch", FOLK), a state-of-the-art corpus of spontenaous interaction data
  • The GeWiss Corpus ("Gesprochene Wissenschaftssprache Kontrastiv") of academic speech
  • Further interaction corpora, such as the Freiburger Korpus ("FR") and the corpus Dialogstrukturen ("DS")
  • The large "historic" dialect corpora of German, most importantly the corpus German dialects ("Deutsche Mundarten", "Zwirner-Korpus", ZW) and its "satellite corpora" German dialects in Eastern Europe (OS), German dialects in the Black Forest region (SV), German dialects in south-west Germany (SW), German dialects in the GDR (DR)
  • Other influential variation corpora for German, such as the corpus Basic German ("Deutsche Umgangssprachen", "Pfeffer-Korpus", PF) and the corpus Standard German ("Deutsche Standardsprache", "König-Korpus", KN), as well as the more recent corpus Deutsch Heute ("DH")
  • Corpora on extra-territorial varieties of German ("speech islands") such as Michael Clyne's corpus on Australian German, a corpus on German in Russia, a corpus on German in Namibia and a corpus on Mennonite Low German in the Americas
  • Anne Betten's corpora on the German of Emigrants to Israel ("Emigrantendeutsch in Israel", IS, ISW, ISZ)
  • Norbert Dittmar's corpus on German reunification ("Berliner Wendekorpus", BW)

Altogether, the DGD contains more than 4,000 hours of audio and video recordings, and more than 12 million transcribed tokens. With a few exceptions, all transcriptions in the database are time-aligned with the recordings and annotated with lemma and part-of-speech information.

short description

documentation

Description of the target group and its size

Scholars interested in spoken German

formats and languages

Input
  • text/plain; format-variant=dgdDGD corpus query
Output
  • text/csvtabular data, comma-separated values
  • application/xml; format-variant=elan-eafELAN annotation file (*.eaf)
  • application/xml; format-variant=exmaralda-exbEXMARaLDA Basic transcription (*.exb)

application type

Web-UI

Datenblatt (Fact sheet)

contact

version

application category

Research Software

application subcategory

other

research activity

Analyzing

data communication encryption

unknown

privacy policy

unknown

authentication

Proprietary

Creators

Thomas Schmidt (Developer)

hoster

Leibniz-Institut für Deutsche Sprache, Mannheim, Germany

usage restrictions for individual users

license only
DiaCollo Analyzing: Deutsch, Englisch
Stateproduction

DiaCollo (pronounced /diːˈakəloʊ/, "dee-ah-kə-loh", analogous to the well-known juggling prop) is a tool for efficient extraction of diachronic collocations from an underlying text corpus. Unlike other collocation extractors such as DWDS Wortprofil, Sketch Engine, or the UCS toolkit, DiaCollo is suitable for extraction and analysis of diachronic collocation data, i.e. collocations whose significance depends on the date of their occurrence. By tracking changes in a word's typical collocates over time and applying J. R. Firth's famous principle that "you shall know a word by the company it keeps", DiaCollo can help to provide a clearer picture of diachronic changes in the word's usage, in particular those related to semantic shift.

short description

documentation

Description of the target group and its size

DiaCollo is particularly useful for researchers in the fields of history, political science, philology and linguistics. Changes in language that manifest themselves in a change of meaning can be researched with this tool. DiaCollo can find typical word combinations (collocations) for keywords within a certain time period. The search result can be filtered in various ways and the search queries can be specified and displayed and output in several visual forms. The word combinations found can be used, for example, to analyze a change in the meaning of words or phrases. DiaCollo searches in digital text collections (corpora), which are available via the German Text Archive (DTA). The visualized search result is directly linked to the underlying corpus base, whereby the results are scientifically traceable and verifiable.

licences

Perl 5 License

formats and languages

Input
  • languagesDeutsch, Englisch
Output
  • text/plainplain text file
  • application/jsonJSON data, Schema
  • application/xhtml+xmlXHTML file, Schema
  • text/tab-separated-valuestabular data, tab-separated values

Localization

German, English

application type

Web-UI

network and security requirements

  • operating systemLinux

developer documentation

Datenblatt (Fact sheet)

contact

  • technical contactjurish@bbaw.de, Bryan Jurish (Developer) [GND]
  • subject matter contactBryan Jurish (Linguist) [GND]

maintenance documentation

https://metacpan.org/release/DiaColloDB

version

application category

Research Software

application subcategory

Collocation Analysis

research activity

Analyzing, Extracting

source code available

data communication encryption

unknown

privacy policy

unknown

authentication

Unknown

Creators

Bryan Jurish (Developer) [GND]

hoster

Berlin-Brandenburgische Akademie der Wissenschaften, Berlin, Germany

usage restrictions for individual users

public

countries supported

all
Distanbol Analyzing: English
Stateproduction

Distanbol analyses texts semantically. For this, it passes the input text to an Apache Stanbol web service that executes a NLP chain yielding named entities. This is followed by Entity Linking on the text. The resulting enhancements are rendered as human-readable HTML-page. In short, Distanbol is adding a human-readable rendering to the JSON-LD output produced by Stanbol.

short description

documentation

Description of the target group and its size

Scholars doing automatic analysis of texts

formats and languages

Input
  • languagesEnglish
  • text/plainplain text file
Output
  • application/xhtml+xmlXHTML file

application type

Web-UI

Datenblatt (Fact sheet)

contact

version

v1.0

application category

Research Software

application subcategory

Text Enhancement

research activity

Analyzing, Semantification

data communication encryption

unknown

privacy policy

unknown

authentication

no authentication

Creators

Apache Foundation (software), Austrian Centre of Digital Humanities (enhancement chains and configuration)

hoster

Vienna, Austria

usage restrictions for individual users

academic usage only

countries supported

all
DTA-Basisformat Archiving: Deutsch
Stateproduction

The DTABf was developed in accordance with the P5-Guidelines of the Text Encoding Initiative (TEI). Since the TEI Guidelines are offering solutions for a huge amount of tagging requirements and are thus rather extensive and flexible, they are meant to be adjusted to the individual necessities of projects working with the TEI. For the DTA this was achieved by creation of the DTABf, a subset of the TEI/P5 tagset, which offers not only fixed sets of elements but also of corresponding attributes and (where applicable) values. The DTABf tagset is fully conformant with the TEI/P5-Guidelines, i.e. the TEI tagset was only reduced not extended in any way.

short description

documentation

Description of the target group and its size

The DTA basic format is part of the German Text Archive (DTA). The offer of the DTA (and thus also DTABf) as a text collection, publication and research platform is aimed at both users and producers of research data from all areas of text-based research, in particular including literary studies, German studies, general philologies, linguistics, computer linguistics, religious studies, church history, philosophy, educational research, historical studies, history of individual disciplines (e.g. medicine) and history of science in general.

formats and languages

Input
  • languagesDeutsch
  • application/xmlXTML file, Schema
  • application/xmlXTML file, Schema
Output
  • application/xmlXTML file, Schema
  • application/xmlXTML file, Schema

Localization

German, English

application type

not applicable

network and security requirements

  • operating systemLinux

Datenblatt (Fact sheet)

contact

  • technical contacthaaf@bbaw.de, Susanne Haaf-Dumont (Developer) [GND]
  • subject matter contactAlexander Geyken (Arbeitsstellenleiter Digitales Wörterbuch der deutschen Sprache) [GND]

version

application category

Research Software

application subcategory

Archiving, Publishing, Annotating, Transcription

research activity

Archiving, Publishing

data communication encryption

unknown

privacy policy

unknown

authentication

Unknown

hoster

Berlin-Brandenburgische Akademie der Wissenschaften, Berlin, Germany

usage restrictions for individual users

public

countries supported

all
English Automatic Speech Recognition System (MP3 file) Speech Recognizing: English
Stateproduction

This webservice uses automatic speech recognition to provide the transcriptions of recordings spoken in English recordings. You can upload and process only one file per project. For bulk processing and other questions, please contact Henk van den Heuvel at h.vandenheuvel@let.ru.nl.

short description

documentation

Description of the target group and its size

Scholars doing automatic analysis of texts

formats and languages

Input
  • languagesEnglish
  • audio/mpeg
Output
  • text/plainplain text file
  • text/xmlXML file

application type

Web-UI

Datenblatt (Fact sheet)

contact

version

application category

Research Software

application subcategory

Speech Recognition

research activity

Speech Recognizing, Transcribing

data communication encryption

unknown

privacy policy

unknown

authentication

Unknown

authentication requirements

yes. Before tool use, Please register at https://webservices-lst.science.ru.nl/register/

Creators

Emre Yilmaz (Centre for Language and Speech Technology, Radboud University)

hoster

Centre for Language and Speech Technology, Radboud University

usage restrictions for individual users

academic usage only

countries supported

all
English Automatic Speech Recognition System (Ogg file) Speech Recognizing: English
Stateproduction

This webservice uses automatic speech recognition to provide the transcriptions of recordings spoken in English recordings. You can upload and process only one file per project. For bulk processing and other questions, please contact Henk van den Heuvel at h.vandenheuvel@let.ru.nl.

short description

documentation

Description of the target group and its size

Scholars doing automatic analysis of texts

formats and languages

Input
  • languagesEnglish
  • audio/ogg
Output
  • text/plainplain text file
  • text/xmlXML file

application type

Web-UI

Datenblatt (Fact sheet)

contact

version

application category

Research Software

application subcategory

Speech Recognition

research activity

Speech Recognizing, Transcribing

data communication encryption

unknown

privacy policy

unknown

authentication

Unknown

authentication requirements

yes. Before tool use, Please register at https://webservices-lst.science.ru.nl/register/

Creators

Emre Yilmaz (Centre for Language and Speech Technology, Radboud University)

hoster

Centre for Language and Speech Technology, Radboud University

usage restrictions for individual users

academic usage only

countries supported

all
English Automatic Speech Recognition System (Wav file) Speech Recognizing: English
Stateproduction

This webservice uses automatic speech recognition to provide the transcriptions of recordings spoken in English recordings. You can upload and process only one file per project. For bulk processing and other questions, please contact Henk van den Heuvel at h.vandenheuvel@let.ru.nl.

short description

documentation

Description of the target group and its size

Scholars doing automatic analysis of texts

formats and languages

Input
  • languagesEnglish
  • audio/wav
Output
  • text/plainplain text file
  • text/xmlXML file

application type

Web-UI

Datenblatt (Fact sheet)

contact

version

application category

Research Software

application subcategory

Speech Recognition

research activity

Speech Recognizing, Transcribing

data communication encryption

unknown

privacy policy

unknown

authentication

Unknown

authentication requirements

yes. Before tool use, Please register at https://webservices-lst.science.ru.nl/register/

Creators

Emre Yilmaz (Centre for Language and Speech Technology, Radboud University)

hoster

Centre for Language and Speech Technology, Radboud University

usage restrictions for individual users

academic usage only

countries supported

all
EXMARaLDA Annotating
Stateproduction
key wordsSpeech, Spoken Language, Transcription

EXMARaLDA is a system for working with oral corpora on a computer. It consists of a transcription and annotation tool (Partitur-Editor), a tool for managing corpora (Corpus-Manager) and a query and analysis tool (EXAKT).

EXMARaLDA's features include, for instance:

  • time-aligned transcription of digital audio or video
  • flexible annotation for freely choosable categories,
  • systematic documentation of a corpus through metadata
  • flexible output of transcription data in various layouts and formats (notation, document)
  • computer-assisted querying of transcription, annotation and metadata
  • interoperable as it works XML based data formats that allow for data exchange with other tools (like Praat, ELAN, Transcriber etc.) and enable a flexible processing and sustainable usage of the data.

EXMARaLDA is used by researchers world wide in different contexts in which spoken language is analysed, these include:

  • conversation and discourse analysis,
  • study of language acquisition and multilingualism,
  • phonetics and phonology,
  • dialectology and sociolinguistics.

EXMARaLDA was developed in the project "Computer assisted methods for the creation and analysis of multilingual data" at the Collaborative Research Center "Multilingualism" (Sonderforschungsbereich "Mehrsprachigkeit" – SFB 538) at the University of Hamburg. Since July 2011, the development of EXMARaLDA is continued at the Hamburg Centre for Language Corpora, since November 2011 in cooperation with the Archive for Spoken German at the Institute for the German Language in Mannheim.

short description

documentation

Description of the target group and its size

Scholars working with transcriptions of spoken language

formats and languages

Input
  • application/tei+xml; format-variant=tei-iso-spokenISO-24624-compliant transcription of spoken language
  • application/xml; format-variant=weblicht-tcfFile in the Text Corpus Format (*.tcf)
  • application/xml; format-variant=exmaralda-exbEXMARaLDA Basic transcription (*.exb)
  • application/xml; format-variant=transcriber-trsTranscriber annotation file (*.trs)
  • application/xml; format-variant=folker-flnFOLKER transcription (*.flk / *.fln)
  • application/xml; format-variant=elan-eafELAN annotation file (*.eaf)
  • application/xml; format-variant=clan-chaCHAT transcription file (*.cha)
  • text/plain; format-variant=praat-textgridPraat TextGrid (*.textGrid)
  • audio/mp3MP3 Audio
  • audio/oggOGG Audio
  • audio/wavWAV Audio
  • video/mp4MP4 Video
  • audio/aiffAIFF Audio
  • audio/mpegMPEG Audio
  • video/mpegMPEG Audio
  • video/oggOGG Video
  • video/aviAVI Video
  • video/x-divxDIVX Video
  • video/movQuicktime Video
Output
  • application/tei+xml; format-variant=tei-iso-spokenISO-24624-compliant transcription of spoken language
  • application/xml; format-variant=weblicht-tcfFile in the Text Corpus Format (*.tcf)
  • application/xml; format-variant=exmaralda-exbEXMARaLDA Basic transcription (*.exb)
  • application/xml; format-variant=transcriber-trsTranscriber annotation file (*.trs)
  • application/xml; format-variant=folker-flnFOLKER transcription (*.flk / *.fln)
  • application/xml; format-variant=elan-eafELAN annotation file (*.eaf)
  • application/xml; format-variant=clan-chaCHAT transcription file (*.cha)
  • application/plain+praatPraat TextGrid (*.textGrid)
  • different video formats

application type

Desktop cross-platform support

network and security requirements

  • operating systemWindows, macOS, Linux
  • runtimeEnvironmentJava (included in newer versions)

developer documentation

Datenblatt (Fact sheet)

contact

version

application category

Research Software

application subcategory

Annotating, Transcription

research activity

Annotating, Transcribing

source code available

data communication encryption

unknown

privacy policy

unknown

authentication

no authentication

Creators

hoster

  • Leibniz-Institut für Deutsche Sprache, Mannheim, Germany
  • HZSK Hamburg, Hamburg Germany

part of an application suite

EXMARaLDA

usage restrictions for individual users

FoLiA-stats Analyzing: generic, Dutch
Statedevelopment

N-gram frequency list generation on FoLiA input.

short description

documentation

Description of the target group and its size

Scholars doing automatic analysis of texts

formats and languages

Input
  • languagesDutch, generic
  • text/folia+xmlFormat for Linguistic Annotation (FoLiA) file
Output
  • wordfreqlist
  • lemmafreqlist
  • lemmaposfreqlist

application type

Web-UI

Datenblatt (Fact sheet)

contact

version

0.2

application category

Research Software

application subcategory

N-Gramming

research activity

Analyzing

data communication encryption

unknown

privacy policy

unknown

authentication

Proprietary

authentication requirements

Yes. Before tool use, please register at https://webservices-lst.science.ru.nl/register.

Creators

Ko van der Sloot (TiCC, Tilburg University)

hoster

Nijmegen, The Netherlands (CLAM Webservices)

usage restrictions for individual users

academic usage only

countries supported

all
Fowlt (plain text) Analyzing: English
Statedevelopment

Fowlt is an online, free-to-use context-sensitive English spelling checker. It follows the setup of the Dutch spelling checker Valkuil.net. Both Valkuil and Fowlt are unlike the typical spelling checkers: whereas the latter mostly try to find errors by comparing all words to a built-in dictionary and flag the word as an error if they can't find a match, Fowlt is context sensitive, taking into account the words around every word. Fowlt makes use of language models. These models are created by giving lots of texts to machine learning software (TiMBL and WOPR).

short description

documentation

Description of the target group and its size

Scholars doing automatic analysis of texts

formats and languages

Input
  • languagesEnglish
  • text/plainplain text file
Output
  • text/folia+xmlFormat for Linguistic Annotation (FoLiA) file

application type

Web-UI

Datenblatt (Fact sheet)

contact

version

application category

Research Software

application subcategory

Spelling correction

research activity

Analyzing

data communication encryption

unknown

privacy policy

unknown

authentication

Proprietary

authentication requirements

Yes. Before tool use, please register at https://webservices-lst.science.ru.nl/register.

Creators

Maarten van Gompel, Ko van der Sloot (CLST, Radboud University Nijmegen)

hoster

Nijmegen, The Netherlands (CLAM Webservices)

usage restrictions for individual users

academic usage only

countries supported

all
Fowlt (xml+folia) Analyzing: English
Statedevelopment

Fowlt is an online, free-to-use context-sensitive English spelling checker. It follows the setup of the Dutch spelling checker Valkuil.net. Both Valkuil and Fowlt are unlike the typical spelling checkers: whereas the latter mostly try to find errors by comparing all words to a built-in dictionary and flag the word as an error if they can't find a match, Fowlt is context sensitive, taking into account the words around every word. Fowlt makes use of language models. These models are created by giving lots of texts to machine learning software (TiMBL and WOPR).

short description

documentation

Description of the target group and its size

Scholars doing automatic analysis of texts

formats and languages

Input
  • languagesEnglish
  • text/folia+xmlFormat for Linguistic Annotation (FoLiA) file
Output
  • text/folia+xmlFormat for Linguistic Annotation (FoLiA) file

application type

Web-UI

Datenblatt (Fact sheet)

contact

version

application category

Research Software

application subcategory

Spelling correction

research activity

Analyzing

data communication encryption

unknown

privacy policy

unknown

authentication

Proprietary

authentication requirements

Yes. Before tool use, please register at https://webservices-lst.science.ru.nl/register.

Creators

Maarten van Gompel, Ko van der Sloot (CLST, Radboud University Nijmegen)

hoster

Nijmegen, The Netherlands (CLAM Webservices)

usage restrictions for individual users

academic usage only

countries supported

all
Frog (FoLiA XML document) Natural Language Processing: Dutch
Stateproduction

Frog is a suite containing a tokeniser, Part-of-Speech tagger, lemmatiser, morphological analyser, shallow parser, and dependency parser for Dutch.

short description

documentation

Description of the target group and its size

Scholars doing automatic analysis of texts

licences

GNU General Public License v3

formats and languages

Input
  • languagesDutch
  • text/xmlXML file
Output
  • text/plainplain text file
  • text/xmlXML file

application type

Web-UI

Datenblatt (Fact sheet)

contact

version

application category

Research Software

application subcategory

NLP suite for Dutch

research activity

Natural Language Processing, Analyzing, POS-Tagging, Lemmatizing, Parsing

data communication encryption

unknown

privacy policy

unknown

authentication

Unknown

authentication requirements

yes. Before tool use, Please register at https://webservices-lst.science.ru.nl/register/

Creators

Ko van der Sloot, Antal van den Bosch, Maarten van Gompel, Bertjan Busser (Centre for Language and Speech Technology, Radboud University)

hoster

Centre for Language and Speech Technology, Radboud University

usage restrictions for individual users

academic usage only

countries supported

all
Frog (folia+xml) Natural Language Processing: Dutch
Statedevelopment

Frog's current version will tokenize, tag, lemmatize, and morphologically segment word tokens in Dutch text files, will assign a dependency graph to each sentence, will identify the base phrase chunks in the sentence, and will attempt to find and label all named entities.

short description

documentation

Description of the target group and its size

Scholars doing automatic analysis of texts

licences

GNU General Public License v3

formats and languages

Input
  • languagesDutch
  • text/folia+xmlFormat for Linguistic Annotation (FoLiA) file
Output
  • Tadpole Columned Output Format
  • text/folia+xmlFormat for Linguistic Annotation (FoLiA) file

application type

Web-UI

Datenblatt (Fact sheet)

contact

version

application category

Research Software

application subcategory

Tokenisation, Lemmatization, Morphological Analysis, Dependency Parsing, Named Entity Recognition

research activity

Natural Language Processing, Annotating, POS-Tagging, Lemmatizing, Parsing

data communication encryption

unknown

privacy policy

unknown

authentication

Proprietary

authentication requirements

Yes. Before tool use, please register at https://webservices-lst.science.ru.nl/register.

Creators

Ko van der Sloot, Maarten van Gompel (CLST, Radboud University Nijmegen)

hoster

Nijmegen, The Netherlands (CLAM Webservices)

usage restrictions for individual users

academic usage only

countries supported

all
Frog (plain text) Annotating: Dutch
Statedevelopment

Frog's current version will tokenize, tag, lemmatize, and morphologically segment word tokens in Dutch text files, will assign a dependency graph to each sentence, will identify the base phrase chunks in the sentence, and will attempt to find and label all named entities.

short description

documentation

Description of the target group and its size

Scholars doing automatic analysis of texts

licences

GNU General Public License v3

formats and languages

Input
  • languagesDutch
  • text/plainplain text file
Output
  • Tadpole Columned Output Format
  • text/folia+xmlFormat for Linguistic Annotation (FoLiA) file

application type

Web-UI

Datenblatt (Fact sheet)

contact

version

application category

Research Software

application subcategory

Tokenisation, Lemmatization, Morphological Analysis, Dependency Parsing, Named Entity Recognition

research activity

Annotating, Parsing

data communication encryption

unknown

privacy policy

unknown

authentication

Proprietary

authentication requirements

Yes. Before tool use, please register at https://webservices-lst.science.ru.nl/register.

Creators

Ko van der Sloot, Maarten van Gompel (CLST, Radboud University Nijmegen)

hoster

Nijmegen, The Netherlands (CLAM Webservices)

usage restrictions for individual users

academic usage only

countries supported

all
Frog (Text document) Natural Language Processing: Dutch
Stateproduction

Frog is a suite containing a tokeniser, Part-of-Speech tagger, lemmatiser, morphological analyser, shallow parser, and dependency parser for Dutch.

short description

documentation

Description of the target group and its size

Scholars doing automatic analysis of texts

licences

GNU General Public License v3

formats and languages

Input
  • languagesDutch
  • text/plainplain text file
Output
  • text/plainplain text file
  • text/xmlXML file

application type

Web-UI

Datenblatt (Fact sheet)

contact

version

application category

Research Software

application subcategory

NLP suite for Dutch

research activity

Natural Language Processing, Annotating, POS-Tagging, Lemmatizing, Parsing

data communication encryption

unknown

privacy policy

unknown

authentication

Unknown

authentication requirements

yes. Before tool use, Please register at https://webservices-lst.science.ru.nl/register/

Creators

Ko van der Sloot, Antal van den Bosch, Maarten van Gompel, Bertjan Busser (Centre for Language and Speech Technology, Radboud University)

hoster

Centre for Language and Speech Technology, Radboud University

usage restrictions for individual users

academic usage only

countries supported

all
Glem (Text to lemmatize) Annotating: Greek, Ancient (to 1453)
Stateproduction

GLEM is a lemmatizer for Ancient Greek.

short description

documentation

Description of the target group and its size

Scholars doing automatic analysis of texts

formats and languages

Input
  • languagesGreek, Ancient (to 1453)
  • text/plainplain text file
Output
  • text/plainplain text file

application type

Web-UI

Datenblatt (Fact sheet)

contact

version

application category

Research Software

application subcategory

Lemmatization

research activity

Annotating, Lemmatizing

data communication encryption

unknown

privacy policy

unknown

authentication

Unknown

authentication requirements

yes. Before tool use, Please register at https://webservices-lst.science.ru.nl/register/

Creators

Corien Bary, Peter Berck, Iris Hendrickx, Wessel Stoop (Faculty of Philosophy, Theology and Religious Studies and Centre for Language and Speech Technology, Radboud University Nijmegen)

hoster

Faculty of Philosophy, Theology and Religious Studies and Centre for Language and Speech Technology, Radboud University Nijmegen

usage restrictions for individual users

academic usage only

countries supported

all
Grapheme to Phoneme converter (Word List) Transformation: English, Dutch
Stateproduction

Grapheme to Phoneme converter using phonetisaurus

short description

documentation

Description of the target group and its size

Scholars doing automatic analysis of texts

formats and languages

Input
  • languagesDutch, English
  • text/plainplain text file
Output
  • text/plainplain text file

application type

Web-UI

Datenblatt (Fact sheet)

contact

version

application category

Research Software

application subcategory

Grapheme to Phoneme Conversion

research activity

Transformation

data communication encryption

unknown

privacy policy

unknown

authentication

Unknown

authentication requirements

yes. Before tool use, Please register at https://webservices-lst.science.ru.nl/register/

Creators

Louis ten Bosch

hoster

unknown

usage restrictions for individual users

academic usage only

countries supported

all
Inkluz Analyzing: Polish
Stateproduction

Inkluz - detects foreign language inclusions in Polish texts.

short description

documentation

Description of the target group and its size

Scholars doing automatic analysis of texts

formats and languages

Input
  • languagesPolish
  • text/plainplain text file
  • application/mswordMicrosoft Word file
  • application/vnd.openxmlformats-officedocument.wordprocessingml.documentMicrosoft OpenXML word processing file (Word)
  • application/vnd.openxmlformats-officedocument.presentationml.presentationMicrosoft OpenXML presentation file (PowerPoint)
  • application/vnd.openxmlformats-officedocument.spreadsheetml.sheetMicrosoft OpenXML spreadsheet file (Excel)
  • application/vnd.oasis.opendocument.textOpenDocument Text file
  • application/pdfAdobe PDF file
  • text/htmlHTML file
  • text/rtfWord Processing File in the Rich Text Format
Output
  • application/octet-streamarbitrary binary data

application type

Web-UI

Datenblatt (Fact sheet)

contact

version

1.0

application category

Research Software

application subcategory

Inclusion detection

research activity

Analyzing

data communication encryption

unknown

privacy policy

unknown

authentication

no authentication

Creators

Clarin-PL

hoster

Wrocław, Poland

usage restrictions for individual users

academic usage only

countries supported

all
Iobber Annotating: Polish
Stateproduction

Chunker for Polish. It recognises shallow syntactic structure (up to three levels) of phrases (chunks) in Polish texts.

short description

documentation

Description of the target group and its size

Scholars doing automatic analysis of texts

formats and languages

Input
  • languagesPolish
  • text/plainplain text file
  • application/mswordMicrosoft Word file
  • application/vnd.openxmlformats-officedocument.wordprocessingml.documentMicrosoft OpenXML word processing file (Word)
  • application/vnd.openxmlformats-officedocument.presentationml.presentationMicrosoft OpenXML presentation file (PowerPoint)
  • application/vnd.openxmlformats-officedocument.spreadsheetml.sheetMicrosoft OpenXML spreadsheet file (Excel)
  • application/vnd.oasis.opendocument.textOpenDocument Text file
  • application/pdfAdobe PDF file
  • text/htmlHTML file
  • text/rtfWord Processing File in the Rich Text Format
Output
  • application/xmlXTML file

application type

Web-UI

Datenblatt (Fact sheet)

contact

version

1.0

application category

Research Software

application subcategory

Shallow Parsing

research activity

Annotating, Parsing

data communication encryption

unknown

privacy policy

unknown

authentication

no authentication

Creators

Clarin-PL

hoster

Wrocław, Poland

usage restrictions for individual users

academic usage only

countries supported

all
KorAP (REST) Searching
Statetesting
key wordsCleanup, Query Language, Corpus Analysis, Corpus Annotation

KorAP is a new corpus analysis platform, optimized for large, multiple annotated corpora and complex search mechanisms.

            KorAP supports the query languages (of) COSMAS II, ANNIS, Poliqarp, Poliqarp+, CQL and FCQL.
            
            KorAP is developed at the Leibniz Institute for German Language in Mannheim. The individual modules are published as open source on GitHub.

short description

documentation

Description of the target group and its size

Corpus Linguists

formats and languages

Input
  • application/jsonJSON data
Output
  • application/jsonJSON data

application type

REST-UI

developer documentation

Datenblatt (Fact sheet)

contact

version

application category

Research Software

application subcategory

Searching

research activity

Searching

source code available

data communication encryption

other

privacy policy

other

authentication

OpenAUTH protocol

Creators

  • Marc Kupietz (Developer) [GND]
  • Franck Bodmer Mory (Developer) [GND]
  • Peter Harders (Developer) [GND]
  • Eliza Margaretha (Developer)
  • Helge Stallkamp (Developer) [GND]
  • Piotr Bański (Developer) [GND]
  • Elena Frick (Developer)
  • Michael Hanl (Developer)
  • Carsten Schnober (Developer)

hoster

Leibniz-Institut für Deutsche Sprache, Mannheim, Germany

part of an application suite

usage restrictions for individual users

KorAP (Web) Analyzing
Statetesting
key wordsCleanup, Query Language, Corpus Analysis, Corpus Annotation

KorAP is a new corpus analysis platform, optimized for large, multiple annotated corpora and complex search mechanisms.

            KorAP supports the query languages (of) COSMAS II, ANNIS, Poliqarp, Poliqarp+, CQL and FCQL.
            
            KorAP is developed at the Leibniz Institute for German Language in Mannheim. The individual modules are published as open source on GitHub.

short description

documentation

Description of the target group and its size

Corpus Linguists

formats and languages

Input
  • text/plain; format-variant=cosmas2COSMAS-II-Abfrage
  • text/plain; format-variant=annisANNIS-Abfrage
  • text/plain; format-variant=poliqarpPoliqarp -Abfrage
  • text/plain; format-variant=poliqarpplusPoliqarp+-Abfrage
  • text/plain; format-variant=cqlCQL-Abfrage
  • text/plain; format-variant=fcqlFCQL-Abfrage
Output
  • text/htmlHTML file

application type

Web-UI

developer documentation

Datenblatt (Fact sheet)

contact

version

application category

Research Software

application subcategory

Discovering

research activity

Analyzing

source code available

data communication encryption

other

privacy policy

other

authentication

OpenAUTH protocol

Creators

  • Marc Kupietz (Developer) [GND]
  • Franck Bodmer Mory (Developer) [GND]
  • Peter Harders (Developer) [GND]
  • Eliza Margaretha (Developer)
  • Helge Stallkamp (Developer) [GND]
  • Piotr Bański (Developer) [GND]
  • Elena Frick (Developer)
  • Michael Hanl (Developer)
  • Carsten Schnober (Developer)

hoster

Leibniz-Institut für Deutsche Sprache, Mannheim, Germany

part of an application suite

KorAP

usage restrictions for individual users

LINDAT Translation Translating: Czech, German, English…
Stateproduction

The input file size is limited to 100kB.

Translates from->to:

Czech->English, Hindi, French, Russian, German

English->Russsian, German, Czech, Hindi, French

Russian->German, French, Czech, Hindi, English

German->Russian, Hindi, Czech, English, French

French->Russian, German, Czech, English, Hindi

short description

documentation

Description of the target group and its size

Scholars doing automatic analysis of texts

licences

See the [terms of use for LINDAT services](https://lindat.mff.cuni.cz/en/terms-of-use).

formats and languages

Input
  • languagesGerman, Russian, Czech, English, French
  • text/plainplain text file
Output
  • text/plainplain text file

application type

Web-UI

Datenblatt (Fact sheet)

contact

version

master

application category

Research Software

application subcategory

Machine Translation

research activity

Translating

data communication encryption

unknown

privacy policy

unknown

authentication

no authentication

Creators

Institute of Formal and Applied Linguistics

hoster

Charles University, Prague, Czech Republic

usage restrictions for individual users

academic usage only

countries supported

all
Liner2 (hosted by D4Science) Annotating: Polish
Stateproduction

This is an experimental integration of a D4Science NLP processing service (NER Liner 2). This service identifies names of persons, locations, organizations, as well as money amounts, time and date expressions in Polish texts automatically.

short description

documentation

Description of the target group and its size

Scholars doing automatic analysis of texts

formats and languages

Input
  • languagesPolish
  • text/plainplain text file
Output
  • application/xmlXTML file

application type

Web-UI

Datenblatt (Fact sheet)

contact

version

v1.0

application category

Research Software

application subcategory

Named Entity Recognition

research activity

Annotating, Named Entity Recognition

data communication encryption

unknown

privacy policy

unknown

authentication

no authentication

Creators

D4Science staff

hoster

Pisa, Italy

usage restrictions for individual users

academic usage only

countries supported

all
Liner2 Annotating: Polish
Stateproduction

Name Entity and Temporal Expression recognition

short description

documentation

Description of the target group and its size

Scholars doing automatic analysis of texts

formats and languages

Input
  • languagesPolish
  • text/plainplain text file
  • application/mswordMicrosoft Word file
  • application/vnd.openxmlformats-officedocument.wordprocessingml.documentMicrosoft OpenXML word processing file (Word)
  • application/vnd.openxmlformats-officedocument.presentationml.presentationMicrosoft OpenXML presentation file (PowerPoint)
  • application/vnd.openxmlformats-officedocument.spreadsheetml.sheetMicrosoft OpenXML spreadsheet file (Excel)
  • application/vnd.oasis.opendocument.textOpenDocument Text file
  • application/pdfAdobe PDF file
  • text/htmlHTML file
  • text/rtfWord Processing File in the Rich Text Format
Output
  • application/xmlXTML file

application type

Web-UI

Datenblatt (Fact sheet)

contact

version

1.0

application category

Research Software

application subcategory

Named Entity Recognition

research activity

Annotating, Named Entity Recognition

data communication encryption

unknown

privacy policy

unknown

authentication

no authentication

Creators

Clarin-PL

hoster

Wrocław, Poland

usage restrictions for individual users

academic usage only

countries supported

all
MaltParser Annotating: Polish
Stateproduction

A language dependency parser chain for Polish. The used tools include Morfeusz-2 with SGJP dictionary (for morphological analysis), wcrft2 (for tagging), and the MaltParser with a model for Polish. The CONLL output can be visualised with DepSVG, a dependency tree and predicate-argument structure visualizer.

short description

documentation

Description of the target group and its size

Scholars doing automatic analysis of texts

formats and languages

Input
  • languagesPolish
  • text/plainplain text file
  • application/mswordMicrosoft Word file
  • application/vnd.openxmlformats-officedocument.wordprocessingml.documentMicrosoft OpenXML word processing file (Word)
  • application/vnd.openxmlformats-officedocument.presentationml.presentationMicrosoft OpenXML presentation file (PowerPoint)
  • application/vnd.openxmlformats-officedocument.spreadsheetml.sheetMicrosoft OpenXML spreadsheet file (Excel)
  • application/vnd.oasis.opendocument.textOpenDocument Text file
  • application/pdfAdobe PDF file
  • text/htmlHTML file
  • text/rtfWord Processing File in the Rich Text Format
Output
  • CoNLL Format

application type

Web-UI

Datenblatt (Fact sheet)

contact

version

1.0

application category

Research Software

application subcategory

Dependency Parsing

research activity

Annotating, Parsing

data communication encryption

unknown

privacy policy

unknown

authentication

no authentication

Creators

Clarin-PL

hoster

Wrocław, Poland

usage restrictions for individual users

academic usage only

countries supported

all
Morfeusz 2 Annotating: Polish
Stateproduction

Morphological analysis of Polish texts by Morfeusz 2 (based on the SGJP dictionary)

short description

documentation

Description of the target group and its size

Scholars doing automatic analysis of texts

formats and languages

Input
  • languagesPolish
  • text/plainplain text file
  • application/mswordMicrosoft Word file
  • application/vnd.openxmlformats-officedocument.wordprocessingml.documentMicrosoft OpenXML word processing file (Word)
  • application/vnd.openxmlformats-officedocument.presentationml.presentationMicrosoft OpenXML presentation file (PowerPoint)
  • application/vnd.openxmlformats-officedocument.spreadsheetml.sheetMicrosoft OpenXML spreadsheet file (Excel)
  • application/vnd.oasis.opendocument.textOpenDocument Text file
  • application/pdfAdobe PDF file
  • text/htmlHTML file
  • text/rtfWord Processing File in the Rich Text Format
Output
  • application/xmlXTML file

application type

Web-UI

Datenblatt (Fact sheet)

contact

version

1.0

application category

Research Software

application subcategory

Morphological Analysis

research activity

Annotating

data communication encryption

unknown

privacy policy

unknown

authentication

no authentication

Creators

Clarin-PL

hoster

Wrocław, Poland

usage restrictions for individual users

academic usage only

countries supported

all
MorphoDiTa Annotating: Polish
Stateproduction

Morphological dictionary and tagger for the analysis of natural language texts in Polish.

short description

documentation

Description of the target group and its size

Scholars doing automatic analysis of texts

formats and languages

Input
  • languagesPolish
  • text/plainplain text file
  • application/mswordMicrosoft Word file
  • application/vnd.openxmlformats-officedocument.wordprocessingml.documentMicrosoft OpenXML word processing file (Word)
  • application/vnd.openxmlformats-officedocument.presentationml.presentationMicrosoft OpenXML presentation file (PowerPoint)
  • application/vnd.openxmlformats-officedocument.spreadsheetml.sheetMicrosoft OpenXML spreadsheet file (Excel)
  • application/vnd.oasis.opendocument.textOpenDocument Text file
  • application/pdfAdobe PDF file
  • text/htmlHTML file
  • text/rtfWord Processing File in the Rich Text Format
Output
  • application/xmlXTML file

application type

Web-UI

Datenblatt (Fact sheet)

contact

version

1.0

application category

Research Software

application subcategory

Morphological Analysis

research activity

Annotating

data communication encryption

unknown

privacy policy

unknown

authentication

no authentication

Creators

Clarin-PL

hoster

Wrocław, Poland

usage restrictions for individual users

academic usage only

countries supported

all
NER NLTK Annotating: English
Stateproduction

Name Entity Recogniser for English by NLTK.

short description

documentation

Description of the target group and its size

Scholars doing automatic analysis of texts

formats and languages

Input
  • languagesEnglish
  • text/plainplain text file
  • application/mswordMicrosoft Word file
  • application/vnd.openxmlformats-officedocument.wordprocessingml.documentMicrosoft OpenXML word processing file (Word)
  • application/vnd.openxmlformats-officedocument.presentationml.presentationMicrosoft OpenXML presentation file (PowerPoint)
  • application/vnd.openxmlformats-officedocument.spreadsheetml.sheetMicrosoft OpenXML spreadsheet file (Excel)
  • application/vnd.oasis.opendocument.textOpenDocument Text file
  • application/pdfAdobe PDF file
  • text/rtfWord Processing File in the Rich Text Format
Output
  • application/xmlXTML file

application type

Web-UI

Datenblatt (Fact sheet)

contact

version

1.0

application category

Research Software

application subcategory

Named Entity Recognition

research activity

Annotating

data communication encryption

unknown

privacy policy

unknown

authentication

no authentication

Creators

Clarin-PL

hoster

Wrocław, Poland

usage restrictions for individual users

academic usage only

countries supported

all
NLP-HUB (multiple NER tools) Annotating: German, English, French…
Stateproduction

This is an experimental integration of a D4Science NLP processing service hub. This service runs a number of NER tools in parallel, and merges their results. It identifies names of persons, locations, organizations, as well as money amounts, time and date expressions -- and other expressions -- in English, French, Italian, Spanish and German texts automatically.

short description

documentation

Description of the target group and its size

Scholars doing automatic analysis of texts

formats and languages

Input
  • languagesEnglish, French, Italian, Spanish, German
  • text/plainplain text file
Output
  • application/xmlXTML file

application type

Web-UI

Datenblatt (Fact sheet)

contact

version

v1.0

application category

Research Software

application subcategory

Named Entity Recognition

research activity

Annotating

data communication encryption

unknown

privacy policy

unknown

authentication

no authentication

Creators

D4Science staff

hoster

Pisa, Italy

usage restrictions for individual users

academic usage only

countries supported

all
Oersetter (FRY-NLD) Translating: Western Frisian
Statedevelopment

Oersetter is a Frisian-Dutch Machine Translation system.

short description

documentation

Description of the target group and its size

Scholars doing automatic analysis of texts

formats and languages

Input
  • languagesWestern Frisian
  • text/plainplain text file
Output
  • text/plainplain text file

application type

Web-UI

Datenblatt (Fact sheet)

contact

version

application category

Research Software

application subcategory

Machine Translation

research activity

Translating

data communication encryption

unknown

privacy policy

unknown

authentication

Proprietary

authentication requirements

Yes. Before tool use, please register at https://webservices-lst.science.ru.nl/register.

Creators

Maarten van Gompel, Ko van der Sloot (CLST, Radboud University Nijmegen)

hoster

Nijmegen, The Netherlands (CLAM Webservices)

usage restrictions for individual users

academic usage only

countries supported

all
Oersetter (NLD-FRY) Translating: Dutch
Statedevelopment

Oersetter is a statistical machine translation (SMT) system for Frisian to Dutch and Dutch to Frisian. A parallel training corpus has been established, which has subsequently been used to automatically learn a phrase-based SMT model. The translation system is built around the open-source SMT software Moses.

short description

documentation

Description of the target group and its size

Scholars doing automatic analysis of texts

formats and languages

Input
  • languagesDutch
  • text/plainplain text file
Output
  • text/plainplain text file

application type

Web-UI

Datenblatt (Fact sheet)

contact

version

application category

Research Software

application subcategory

Machine Translation

research activity

Translating

data communication encryption

unknown

privacy policy

unknown

authentication

Proprietary

authentication requirements

Yes. Before tool use, please register at https://webservices-lst.science.ru.nl/register.

Creators

Maarten van Gompel, Ko van der Sloot (CLST, Radboud University Nijmegen)

hoster

Nijmegen, The Netherlands (CLAM Webservices)

usage restrictions for individual users

academic usage only

countries supported

all
Opener Tokenizer Analyzing: German, English, French…
Statedevelopment

Tokenizer for Dutch, English, German, French, Spanish and Italian. Consumes Plain text and produces TCF.

short description

documentation

Description of the target group and its size

Scholars doing automatic analysis of texts

formats and languages

Input
  • languagesEnglish, Italian, Spanish, French, Dutch, German
  • text/plainplain text file
Output
  • application/tcf+xmlTCF file

application type

Web-UI

Datenblatt (Fact sheet)

contact

version

1.0

application category

Research Software

application subcategory

Tokenisation

research activity

Analyzing

data communication encryption

unknown

privacy policy

unknown

authentication

no authentication

Creators

CLARIN-IT

hoster

Pisa, Italy

usage restrictions for individual users

academic usage only

countries supported

all
PICCL (DJVU document containing scanned pages (perform OCR)) Correcting: German, Greek, Modern (1453-), English…
Stateproduction

PICCL offers a workflow for corpus building and builds on a variety of tools. The primary component of PICCL is TICCL; a Text-induced Corpus Clean-up system, which performs spelling correction and OCR post-correction (normalisation of spelling variants etc).

short description

documentation

Description of the target group and its size

Scholars doing automatic analysis of texts

formats and languages

Input
  • languagesEnglish, Dutch, Finnish, French, German, Greek, Modern (1453-), Greek, Ancient (to 1453), Icelandic, Italian, Latin, Polish, Portuguese, Romanian, Russian, Spanish, Swedish
  • application/pdfAdobe PDF file
Output
  • text/plainplain text file
  • text/xmlXML file

application type

Web-UI

Datenblatt (Fact sheet)

contact

version

application category

Research Software

application subcategory

Text Normalisation

research activity

Correcting, Data Cleansing

data communication encryption

unknown

privacy policy

unknown

authentication

Unknown

authentication requirements

yes. Before tool use, Please register at https://webservices-lst.science.ru.nl/register/

Creators

Martin Reynaert, Maarten van Gompel, Ko van der Sloot

hoster

unknown

usage restrictions for individual users

academic usage only

countries supported

all
PICCL (FoLiA with OCR text layer already present (no OCR)) Correcting: German, Greek, Modern (1453-), English…
Stateproduction

PICCL offers a workflow for corpus building and builds on a variety of tools. The primary component of PICCL is TICCL; a Text-induced Corpus Clean-up system, which performs spelling correction and OCR post-correction (normalisation of spelling variants etc).

short description

documentation

Description of the target group and its size

Scholars doing automatic analysis of texts

formats and languages

Input
  • languagesEnglish, Dutch, Finnish, French, German, Greek, Modern (1453-), Greek, Ancient (to 1453), Icelandic, Italian, Latin, Polish, Portuguese, Romanian, Russian, Spanish, Swedish
  • text/xmlXML file
Output
  • text/plainplain text file
  • text/xmlXML file

application type

Web-UI

Datenblatt (Fact sheet)

contact

version

application category

Research Software

application subcategory

Text Normalisation

research activity

Correcting, Data Cleansing

data communication encryption

unknown

privacy policy

unknown

authentication

Unknown

authentication requirements

yes. Before tool use, Please register at https://webservices-lst.science.ru.nl/register/

Creators

Martin Reynaert, Maarten van Gompel, Ko van der Sloot

hoster

unknown

usage restrictions for individual users

academic usage only

countries supported

all
PICCL (PDF document with embedded text (no OCR)) Correcting: German, Greek, Modern (1453-), English…
Stateproduction

PICCL offers a workflow for corpus building and builds on a variety of tools. The primary component of PICCL is TICCL; a Text-induced Corpus Clean-up system, which performs spelling correction and OCR post-correction (normalisation of spelling variants etc).

short description

documentation

Description of the target group and its size

Scholars doing automatic analysis of texts

formats and languages

Input
  • languagesEnglish, Dutch, Finnish, French, German, Greek, Modern (1453-), Greek, Ancient (to 1453), Icelandic, Italian, Latin, Polish, Portuguese, Romanian, Russian, Spanish, Swedish
  • application/pdfAdobe PDF file
Output
  • text/plainplain text file
  • text/xmlXML file

application type

Web-UI

Datenblatt (Fact sheet)

contact

version

application category

Research Software

application subcategory

Text Normalisation

research activity

Correcting, Data Cleansing

data communication encryption

unknown

privacy policy

unknown

authentication

Unknown

authentication requirements

yes. Before tool use, Please register at https://webservices-lst.science.ru.nl/register/

Creators

Martin Reynaert, Maarten van Gompel, Ko van der Sloot

hoster

unknown

usage restrictions for individual users

academic usage only

countries supported

all
PICCL (PDF document with scanned pages (images) (perform OCR)) Correcting: German, Greek, Modern (1453-), English…
Stateproduction

PICCL offers a workflow for corpus building and builds on a variety of tools. The primary component of PICCL is TICCL; a Text-induced Corpus Clean-up system, which performs spelling correction and OCR post-correction (normalisation of spelling variants etc).

short description

documentation

Description of the target group and its size

Scholars doing automatic analysis of texts

formats and languages

Input
  • languagesEnglish, Dutch, Finnish, French, German, Greek, Modern (1453-), Greek, Ancient (to 1453), Icelandic, Italian, Latin, Polish, Portuguese, Romanian, Russian, Spanish, Swedish
  • application/pdfAdobe PDF file
Output
  • text/plainplain text file
  • text/xmlXML file

application type

Web-UI

Datenblatt (Fact sheet)

contact

version

application category

Research Software

application subcategory

Text Normalisation

research activity

Correcting, Data Cleansing

data communication encryption

unknown

privacy policy

unknown

authentication

Unknown

authentication requirements

yes. Before tool use, Please register at https://webservices-lst.science.ru.nl/register/

Creators

Martin Reynaert, Maarten van Gompel, Ko van der Sloot

hoster

unknown

usage restrictions for individual users

academic usage only

countries supported

all
PICCL (Plain-text document (UTF-8, no OCR)) Correcting: German, Greek, Modern (1453-), English…
Stateproduction

PICCL offers a workflow for corpus building and builds on a variety of tools. The primary component of PICCL is TICCL; a Text-induced Corpus Clean-up system, which performs spelling correction and OCR post-correction (normalisation of spelling variants etc).

short description

documentation

Description of the target group and its size

Scholars doing automatic analysis of texts

formats and languages

Input
  • languagesEnglish, Dutch, Finnish, French, German, Greek, Modern (1453-), Greek, Ancient (to 1453), Icelandic, Italian, Latin, Polish, Portuguese, Romanian, Russian, Spanish, Swedish
  • text/plainplain text file
Output
  • text/plainplain text file
  • text/xmlXML file

application type

Web-UI

Datenblatt (Fact sheet)

contact

version

application category

Research Software

application subcategory

Text Normalisation

research activity

Correcting, Data Cleansing

data communication encryption

unknown

privacy policy

unknown

authentication

Unknown

authentication requirements

yes. Before tool use, Please register at https://webservices-lst.science.ru.nl/register/

Creators

Martin Reynaert, Maarten van Gompel, Ko van der Sloot

hoster

unknown

usage restrictions for individual users

academic usage only

countries supported

all
PICCL (TIF image of a scanned page (perform OCR)) Correcting: German, Greek, Modern (1453-), English…
Stateproduction

PICCL offers a workflow for corpus building and builds on a variety of tools. The primary component of PICCL is TICCL; a Text-induced Corpus Clean-up system, which performs spelling correction and OCR post-correction (normalisation of spelling variants etc).

short description

documentation

Description of the target group and its size

Scholars doing automatic analysis of texts

formats and languages

Input
  • languagesEnglish, Dutch, Finnish, French, German, Greek, Modern (1453-), Greek, Ancient (to 1453), Icelandic, Italian, Latin, Polish, Portuguese, Romanian, Russian, Spanish, Swedish
  • image/tiff
Output
  • text/plainplain text file
  • text/xmlXML file

application type

Web-UI

Datenblatt (Fact sheet)

contact

version

application category

Research Software

application subcategory

Text Normalisation

research activity

Correcting, Data Cleansing

data communication encryption

unknown

privacy policy

unknown

authentication

Unknown

authentication requirements

yes. Before tool use, Please register at https://webservices-lst.science.ru.nl/register/

Creators

Martin Reynaert, Maarten van Gompel, Ko van der Sloot

hoster

unknown

usage restrictions for individual users

academic usage only

countries supported

all
Piereling (Alpino XML for conversion to FoLiA) Converting
Stateproduction

Piereling can convert a wide variety of document formats to FoLiA XML, and from FoLiA XML to various formats. Data conversions such as these provide the groundwork for Natural Language Processing pipelines. It relies on numerous specialised conversion tools in combination with notable third-party tools such as pandoc.

short description

documentation

Description of the target group and its size

Scholars doing automatic analysis of texts

licences

GNU Public License v3

formats and languages

Input
  • text/alpino+xml
Output
  • text/plainplain text file
  • text/xmlXML file

application type

Web-UI

Datenblatt (Fact sheet)

contact

version

application category

Research Software

application subcategory

Data Conversion

research activity

Converting

data communication encryption

unknown

privacy policy

unknown

authentication

Unknown

authentication requirements

yes. Before tool use, Please register at https://webservices-lst.science.ru.nl/register/

Creators

Maarten van Gompel (Centre for Language and Speech Technology, Radboud University)

hoster

Centre for Language and Speech Technology, Radboud University

usage restrictions for individual users

academic usage only

countries supported

all
Piereling (CONLL-U format for conversion to FoLiA) Converting
Stateproduction

Piereling can convert a wide variety of document formats to FoLiA XML, and from FoLiA XML to various formats. Data conversions such as these provide the groundwork for Natural Language Processing pipelines. It relies on numerous specialised conversion tools in combination with notable third-party tools such as pandoc.

short description

documentation

Description of the target group and its size

Scholars doing automatic analysis of texts

licences

GNU Public License v3

formats and languages

Input
  • text/plainplain text file
Output
  • text/plainplain text file
  • text/xmlXML file

application type

Web-UI

Datenblatt (Fact sheet)

contact

version

application category

Research Software

application subcategory

Data Conversion

research activity

Converting

data communication encryption

unknown

privacy policy

unknown

authentication

Unknown

authentication requirements

yes. Before tool use, Please register at https://webservices-lst.science.ru.nl/register/

Creators

Maarten van Gompel (Centre for Language and Speech Technology, Radboud University)

hoster

Centre for Language and Speech Technology, Radboud University

usage restrictions for individual users

academic usage only

countries supported

all
Piereling (Docbook for conversion to FoLiA) Converting
Stateproduction

Piereling can convert a wide variety of document formats to FoLiA XML, and from FoLiA XML to various formats. Data conversions such as these provide the groundwork for Natural Language Processing pipelines. It relies on numerous specialised conversion tools in combination with notable third-party tools such as pandoc.

short description

documentation

Description of the target group and its size

Scholars doing automatic analysis of texts

licences

GNU Public License v3

formats and languages

Input
  • application/docbook+xml
Output
  • text/plainplain text file
  • text/xmlXML file

application type

Web-UI

Datenblatt (Fact sheet)

contact

version

application category

Research Software

application subcategory

Data Conversion

research activity

Converting

data communication encryption

unknown

privacy policy

unknown

authentication

Unknown

authentication requirements

yes. Before tool use, Please register at https://webservices-lst.science.ru.nl/register/

Creators

Maarten van Gompel (Centre for Language and Speech Technology, Radboud University)

hoster

Centre for Language and Speech Technology, Radboud University

usage restrictions for individual users

academic usage only

countries supported

all
Piereling (EPUB for conversion to FoLiA) Converting
Stateproduction

Piereling can convert a wide variety of document formats to FoLiA XML, and from FoLiA XML to various formats. Data conversions such as these provide the groundwork for Natural Language Processing pipelines. It relies on numerous specialised conversion tools in combination with notable third-party tools such as pandoc.

short description

documentation

Description of the target group and its size

Scholars doing automatic analysis of texts

licences

GNU Public License v3

formats and languages

Input
  • application/epub+zip
Output
  • text/plainplain text file
  • text/xmlXML file

application type

Web-UI

Datenblatt (Fact sheet)

contact

version

application category

Research Software

application subcategory

Data Conversion

research activity

Converting

data communication encryption

unknown

privacy policy

unknown

authentication

Unknown

authentication requirements

yes. Before tool use, Please register at https://webservices-lst.science.ru.nl/register/

Creators

Maarten van Gompel (Centre for Language and Speech Technology, Radboud University)

hoster

Centre for Language and Speech Technology, Radboud University

usage restrictions for individual users

academic usage only

countries supported

all
Piereling (FoLiA XML input for conversion to HTML) Converting
Stateproduction

Piereling can convert a wide variety of document formats to FoLiA XML, and from FoLiA XML to various formats. Data conversions such as these provide the groundwork for Natural Language Processing pipelines. It relies on numerous specialised conversion tools in combination with notable third-party tools such as pandoc.

short description

documentation

Description of the target group and its size

Scholars doing automatic analysis of texts

licences

GNU Public License v3

formats and languages

Input
  • text/xmlXML file
Output
  • text/htmlHTML file
  • text/plainplain text file

application type

Web-UI

Datenblatt (Fact sheet)

contact

version

application category

Research Software

application subcategory

Data Conversion

research activity

Converting

data communication encryption

unknown

privacy policy

unknown

authentication

Unknown

authentication requirements

yes. Before tool use, Please register at https://webservices-lst.science.ru.nl/register/

Creators

Maarten van Gompel (Centre for Language and Speech Technology, Radboud University)

hoster

Centre for Language and Speech Technology, Radboud University

usage restrictions for individual users

academic usage only

countries supported

all
Piereling (FoLiA XML input for conversion to ReStructuredText) Converting
Stateproduction

Piereling can convert a wide variety of document formats to FoLiA XML, and from FoLiA XML to various formats. Data conversions such as these provide the groundwork for Natural Language Processing pipelines. It relies on numerous specialised conversion tools in combination with notable third-party tools such as pandoc.

short description

documentation

Description of the target group and its size

Scholars doing automatic analysis of texts

licences

GNU Public License v3

formats and languages

Input
  • text/xmlXML file
Output
  • text/plainplain text file
  • text/rst

application type

Web-UI

Datenblatt (Fact sheet)

contact

version

application category

Research Software

application subcategory

Data Conversion

research activity

Converting

data communication encryption

unknown

privacy policy

unknown

authentication

Unknown

authentication requirements

yes. Before tool use, Please register at https://webservices-lst.science.ru.nl/register/

Creators

Maarten van Gompel (Centre for Language and Speech Technology, Radboud University)

hoster

Centre for Language and Speech Technology, Radboud University

usage restrictions for individual users

academic usage only

countries supported

all
Piereling (FoLiA XML input for conversion to text) Converting
Stateproduction

Piereling can convert a wide variety of document formats to FoLiA XML, and from FoLiA XML to various formats. Data conversions such as these provide the groundwork for Natural Language Processing pipelines. It relies on numerous specialised conversion tools in combination with notable third-party tools such as pandoc.

short description

documentation

Description of the target group and its size

Scholars doing automatic analysis of texts

licences

GNU Public License v3

formats and languages

Input
  • text/xmlXML file
Output
  • text/plainplain text file

application type

Web-UI

Datenblatt (Fact sheet)

contact

version

application category

Research Software

application subcategory

Data Conversion

research activity

Converting

data communication encryption

unknown

privacy policy

unknown

authentication

Unknown

authentication requirements

yes. Before tool use, Please register at https://webservices-lst.science.ru.nl/register/

Creators

Maarten van Gompel (Centre for Language and Speech Technology, Radboud University)

hoster

Centre for Language and Speech Technology, Radboud University

usage restrictions for individual users

academic usage only

countries supported

all
Piereling (FoLiA XML input for upgrade to a newer FoLiA) Converting
Stateproduction

Piereling can convert a wide variety of document formats to FoLiA XML, and from FoLiA XML to various formats. Data conversions such as these provide the groundwork for Natural Language Processing pipelines. It relies on numerous specialised conversion tools in combination with notable third-party tools such as pandoc.

short description

documentation

Description of the target group and its size

Scholars doing automatic analysis of texts

licences

GNU Public License v3

formats and languages

Input
  • text/xmlXML file
Output
  • text/plainplain text file
  • text/xmlXML file

application type

Web-UI

Datenblatt (Fact sheet)

contact

version

application category

Research Software

application subcategory

Data Conversion

research activity

Converting

data communication encryption

unknown

privacy policy

unknown

authentication

Unknown

authentication requirements

yes. Before tool use, Please register at https://webservices-lst.science.ru.nl/register/

Creators

Maarten van Gompel (Centre for Language and Speech Technology, Radboud University)

hoster

Centre for Language and Speech Technology, Radboud University

usage restrictions for individual users

academic usage only

countries supported

all
Piereling (FoLiA XML input for validation) Converting
Stateproduction

Piereling can convert a wide variety of document formats to FoLiA XML, and from FoLiA XML to various formats. Data conversions such as these provide the groundwork for Natural Language Processing pipelines. It relies on numerous specialised conversion tools in combination with notable third-party tools such as pandoc.

short description

documentation

Description of the target group and its size

Scholars doing automatic analysis of texts

licences

GNU Public License v3

formats and languages

Input
  • text/xmlXML file
Output
  • text/plainplain text file

application type

Web-UI

Datenblatt (Fact sheet)

contact

version

application category

Research Software

application subcategory

Data Conversion

research activity

Converting

data communication encryption

unknown

privacy policy

unknown

authentication

Unknown

authentication requirements

yes. Before tool use, Please register at https://webservices-lst.science.ru.nl/register/

Creators

Maarten van Gompel (Centre for Language and Speech Technology, Radboud University)

hoster

Centre for Language and Speech Technology, Radboud University

usage restrictions for individual users

academic usage only

countries supported

all
Piereling (HTML for conversion to FoLiA) Converting
Stateproduction

Piereling can convert a wide variety of document formats to FoLiA XML, and from FoLiA XML to various formats. Data conversions such as these provide the groundwork for Natural Language Processing pipelines. It relies on numerous specialised conversion tools in combination with notable third-party tools such as pandoc.

short description

documentation

Description of the target group and its size

Scholars doing automatic analysis of texts

licences

GNU Public License v3

formats and languages

Input
  • text/htmlHTML file
Output
  • text/plainplain text file
  • text/xmlXML file

application type

Web-UI

Datenblatt (Fact sheet)

contact

version

application category

Research Software

application subcategory

Data Conversion

research activity

Converting

data communication encryption

unknown

privacy policy

unknown

authentication

Unknown

authentication requirements

yes. Before tool use, Please register at https://webservices-lst.science.ru.nl/register/

Creators

Maarten van Gompel (Centre for Language and Speech Technology, Radboud University)

hoster

Centre for Language and Speech Technology, Radboud University

usage restrictions for individual users

academic usage only

countries supported

all
Piereling (LaTeX source for conversion to FoLiA) Converting
Stateproduction

Piereling can convert a wide variety of document formats to FoLiA XML, and from FoLiA XML to various formats. Data conversions such as these provide the groundwork for Natural Language Processing pipelines. It relies on numerous specialised conversion tools in combination with notable third-party tools such as pandoc.

short description

documentation

Description of the target group and its size

Scholars doing automatic analysis of texts

licences

GNU Public License v3

formats and languages

Input
  • application/x-latex
Output
  • text/plainplain text file
  • text/xmlXML file

application type

Web-UI

Datenblatt (Fact sheet)

contact

version

application category

Research Software

application subcategory

Data Conversion

research activity

Converting

data communication encryption

unknown

privacy policy

unknown

authentication

Unknown

authentication requirements

yes. Before tool use, Please register at https://webservices-lst.science.ru.nl/register/

Creators

Maarten van Gompel (Centre for Language and Speech Technology, Radboud University)

hoster

Centre for Language and Speech Technology, Radboud University

usage restrictions for individual users

academic usage only

countries supported

all
Piereling (Markdown Input for conversion to FoLiA) Converting
Stateproduction

Piereling can convert a wide variety of document formats to FoLiA XML, and from FoLiA XML to various formats. Data conversions such as these provide the groundwork for Natural Language Processing pipelines. It relies on numerous specialised conversion tools in combination with notable third-party tools such as pandoc.

short description

documentation

Description of the target group and its size

Scholars doing automatic analysis of texts

licences

GNU Public License v3

formats and languages

Input
  • text/markdown
Output
  • text/plainplain text file
  • text/xmlXML file

application type

Web-UI

Datenblatt (Fact sheet)

contact

version

application category

Research Software

application subcategory

Data Conversion

research activity

Converting

data communication encryption

unknown

privacy policy

unknown

authentication

Unknown

authentication requirements

yes. Before tool use, Please register at https://webservices-lst.science.ru.nl/register/

Creators

Maarten van Gompel (Centre for Language and Speech Technology, Radboud University)

hoster

Centre for Language and Speech Technology, Radboud University

usage restrictions for individual users

academic usage only

countries supported

all
Piereling (MediaWiki Markup (Wikipedia and others) for conversion to FoLiA) Converting
Stateproduction

Piereling can convert a wide variety of document formats to FoLiA XML, and from FoLiA XML to various formats. Data conversions such as these provide the groundwork for Natural Language Processing pipelines. It relies on numerous specialised conversion tools in combination with notable third-party tools such as pandoc.

short description

documentation

Description of the target group and its size

Scholars doing automatic analysis of texts

licences

GNU Public License v3

formats and languages

Input
  • text/plainplain text file
Output
  • text/plainplain text file
  • text/xmlXML file

application type

Web-UI

Datenblatt (Fact sheet)

contact

version

application category

Research Software

application subcategory

Data Conversion

research activity

Converting

data communication encryption

unknown

privacy policy

unknown

authentication

Unknown

authentication requirements

yes. Before tool use, Please register at https://webservices-lst.science.ru.nl/register/

Creators

Maarten van Gompel (Centre for Language and Speech Technology, Radboud University)

hoster

Centre for Language and Speech Technology, Radboud University

usage restrictions for individual users

academic usage only

countries supported

all
Piereling (MS Word (Office Open XML, docx) input for conversion to FoLiA) Converting
Stateproduction

Piereling can convert a wide variety of document formats to FoLiA XML, and from FoLiA XML to various formats. Data conversions such as these provide the groundwork for Natural Language Processing pipelines. It relies on numerous specialised conversion tools in combination with notable third-party tools such as pandoc.

short description

documentation

Description of the target group and its size

Scholars doing automatic analysis of texts

licences

GNU Public License v3

formats and languages

Input
  • application/mswordMicrosoft Word file
Output
  • text/plainplain text file
  • text/xmlXML file

application type

Web-UI

Datenblatt (Fact sheet)

contact

version

application category

Research Software

application subcategory

Data Conversion

research activity

Converting

data communication encryption

unknown

privacy policy

unknown

authentication

Unknown

authentication requirements

yes. Before tool use, Please register at https://webservices-lst.science.ru.nl/register/

Creators

Maarten van Gompel (Centre for Language and Speech Technology, Radboud University)

hoster

Centre for Language and Speech Technology, Radboud University

usage restrictions for individual users

academic usage only

countries supported

all
Piereling (NAF XML for conversion to FoLiA) Converting
Stateproduction

Piereling can convert a wide variety of document formats to FoLiA XML, and from FoLiA XML to various formats. Data conversions such as these provide the groundwork for Natural Language Processing pipelines. It relies on numerous specialised conversion tools in combination with notable third-party tools such as pandoc.

short description

documentation

Description of the target group and its size

Scholars doing automatic analysis of texts

licences

GNU Public License v3

formats and languages

Input
  • text/naf+xml
Output
  • text/plainplain text file
  • text/xmlXML file

application type

Web-UI

Datenblatt (Fact sheet)

contact

version

application category

Research Software

application subcategory

Data Conversion

research activity

Converting

data communication encryption

unknown

privacy policy

unknown

authentication

Unknown

authentication requirements

yes. Before tool use, Please register at https://webservices-lst.science.ru.nl/register/

Creators

Maarten van Gompel (Centre for Language and Speech Technology, Radboud University)

hoster

Centre for Language and Speech Technology, Radboud University

usage restrictions for individual users

academic usage only

countries supported

all
Piereling (OpenDocument Text Document (odt) for conversion to FoLiA) Converting
Stateproduction

Piereling can convert a wide variety of document formats to FoLiA XML, and from FoLiA XML to various formats. Data conversions such as these provide the groundwork for Natural Language Processing pipelines. It relies on numerous specialised conversion tools in combination with notable third-party tools such as pandoc.

short description

documentation

Description of the target group and its size

Scholars doing automatic analysis of texts

licences

GNU Public License v3

formats and languages

Input
  • application/mswordMicrosoft Word file
Output
  • text/plainplain text file
  • text/xmlXML file

application type

Web-UI

Datenblatt (Fact sheet)

contact

version

application category

Research Software

application subcategory

Data Conversion

research activity

Converting

data communication encryption

unknown

privacy policy

unknown

authentication

Unknown

authentication requirements

yes. Before tool use, Please register at https://webservices-lst.science.ru.nl/register/

Creators

Maarten van Gompel (Centre for Language and Speech Technology, Radboud University)

hoster

Centre for Language and Speech Technology, Radboud University

usage restrictions for individual users

academic usage only

countries supported

all
Piereling (PDF with embedded text (pdf) for conversion to FoLiA) Converting
Stateproduction

Piereling can convert a wide variety of document formats to FoLiA XML, and from FoLiA XML to various formats. Data conversions such as these provide the groundwork for Natural Language Processing pipelines. It relies on numerous specialised conversion tools in combination with notable third-party tools such as pandoc.

short description

documentation

Description of the target group and its size

Scholars doing automatic analysis of texts

licences

GNU Public License v3

formats and languages

Input
  • application/pdfAdobe PDF file
Output
  • text/plainplain text file
  • text/xmlXML file

application type

Web-UI

Datenblatt (Fact sheet)

contact

version

application category

Research Software

application subcategory

Data Conversion

research activity

Converting

data communication encryption

unknown

privacy policy

unknown

authentication

Unknown

authentication requirements

yes. Before tool use, Please register at https://webservices-lst.science.ru.nl/register/

Creators

Maarten van Gompel (Centre for Language and Speech Technology, Radboud University)

hoster

Centre for Language and Speech Technology, Radboud University

usage restrictions for individual users

academic usage only

countries supported

all
Piereling (Plain text input for conversion to FoLiA) Converting
Stateproduction

Piereling can convert a wide variety of document formats to FoLiA XML, and from FoLiA XML to various formats. Data conversions such as these provide the groundwork for Natural Language Processing pipelines. It relies on numerous specialised conversion tools in combination with notable third-party tools such as pandoc.

short description

documentation

Description of the target group and its size

Scholars doing automatic analysis of texts

licences

GNU Public License v3

formats and languages

Input
  • text/plainplain text file
Output
  • text/plainplain text file
  • text/xmlXML file

application type

Web-UI

Datenblatt (Fact sheet)

contact

version

application category

Research Software

application subcategory

Data Conversion

research activity

Converting

data communication encryption

unknown

privacy policy

unknown

authentication

Unknown

authentication requirements

yes. Before tool use, Please register at https://webservices-lst.science.ru.nl/register/

Creators

Maarten van Gompel (Centre for Language and Speech Technology, Radboud University)

hoster

Centre for Language and Speech Technology, Radboud University

usage restrictions for individual users

academic usage only

countries supported

all
Piereling (ReStructuredText Input for conversion to FoLiA) Converting
Stateproduction

Piereling can convert a wide variety of document formats to FoLiA XML, and from FoLiA XML to various formats. Data conversions such as these provide the groundwork for Natural Language Processing pipelines. It relies on numerous specialised conversion tools in combination with notable third-party tools such as pandoc.

short description

documentation

Description of the target group and its size

Scholars doing automatic analysis of texts

licences

GNU Public License v3

formats and languages

Input
  • text/rst
Output
  • text/plainplain text file
  • text/xmlXML file

application type

Web-UI

Datenblatt (Fact sheet)

contact

version

application category

Research Software

application subcategory

Data Conversion

research activity

Converting

data communication encryption

unknown

privacy policy

unknown

authentication

Unknown

authentication requirements

yes. Before tool use, Please register at https://webservices-lst.science.ru.nl/register/

Creators

Maarten van Gompel (Centre for Language and Speech Technology, Radboud University)

hoster

Centre for Language and Speech Technology, Radboud University

usage restrictions for individual users

academic usage only

countries supported

all
Piereling (TEI P5 XML input for conversion to FoLiA) Converting
Stateproduction

Piereling can convert a wide variety of document formats to FoLiA XML, and from FoLiA XML to various formats. Data conversions such as these provide the groundwork for Natural Language Processing pipelines. It relies on numerous specialised conversion tools in combination with notable third-party tools such as pandoc.

short description

documentation

Description of the target group and its size

Scholars doing automatic analysis of texts

licences

GNU Public License v3

formats and languages

Input
  • application/tei+xmlTEI-P5-compliant XML
Output
  • text/plainplain text file
  • text/xmlXML file

application type

Web-UI

Datenblatt (Fact sheet)

contact

version

application category

Research Software

application subcategory

Data Conversion

research activity

Converting

data communication encryption

unknown

privacy policy

unknown

authentication

Unknown

authentication requirements

yes. Before tool use, Please register at https://webservices-lst.science.ru.nl/register/

Creators

Maarten van Gompel (Centre for Language and Speech Technology, Radboud University)

hoster

Centre for Language and Speech Technology, Radboud University

usage restrictions for individual users

academic usage only

countries supported

all
ReSpa Extracting: Polish
Stateproduction

Keywords extraction for Polish by ReSpa based on the representation of text documents as word graphs.

short description

documentation

Description of the target group and its size

Scholars doing automatic analysis of texts

formats and languages

Input
  • languagesPolish
  • text/plainplain text file
  • application/mswordMicrosoft Word file
  • application/vnd.openxmlformats-officedocument.wordprocessingml.documentMicrosoft OpenXML word processing file (Word)
  • application/vnd.openxmlformats-officedocument.presentationml.presentationMicrosoft OpenXML presentation file (PowerPoint)
  • application/vnd.openxmlformats-officedocument.spreadsheetml.sheetMicrosoft OpenXML spreadsheet file (Excel)
  • application/vnd.oasis.opendocument.textOpenDocument Text file
  • application/pdfAdobe PDF file
  • text/htmlHTML file
  • text/rtfWord Processing File in the Rich Text Format
Output
  • application/xmlXTML file

application type

Web-UI

Datenblatt (Fact sheet)

contact

version

1.0

application category

Research Software

application subcategory

Keyword Extractor

research activity

Extracting

data communication encryption

unknown

privacy policy

unknown

authentication

no authentication

Creators

Clarin-PL

hoster

Wrocław, Poland

usage restrictions for individual users

academic usage only

countries supported

all
Serel Analyzing: Polish
Stateproduction

Detection of semantic relations between Named Entities in Polish texts by Serel.

short description

documentation

Description of the target group and its size

Scholars doing automatic analysis of texts

formats and languages

Input
  • languagesPolish
  • text/plainplain text file
  • application/mswordMicrosoft Word file
  • application/vnd.openxmlformats-officedocument.wordprocessingml.documentMicrosoft OpenXML word processing file (Word)
  • application/vnd.openxmlformats-officedocument.presentationml.presentationMicrosoft OpenXML presentation file (PowerPoint)
  • application/vnd.openxmlformats-officedocument.spreadsheetml.sheetMicrosoft OpenXML spreadsheet file (Excel)
  • application/vnd.oasis.opendocument.textOpenDocument Text file
  • application/pdfAdobe PDF file
  • text/htmlHTML file
  • text/rtfWord Processing File in the Rich Text Format
Output
  • application/xmlXTML file

application type

Web-UI

Datenblatt (Fact sheet)

contact

version

1.0

application category

Research Software

application subcategory

Relation between named entities detection

research activity

Analyzing

data communication encryption

unknown

privacy policy

unknown

authentication

no authentication

Creators

Clarin-PL

hoster

Wrocław, Poland

usage restrictions for individual users

academic usage only

countries supported

all
Spacy (hosted by D4Science) - DE Annotating: German
Stateproduction

This is an experimental integration of a D4Science NLP processing service (spaCy). This service identifies performs dependency parsing for plain German text. For more information on spaCy, see https://spacy.io.

short description

documentation

Description of the target group and its size

Scholars doing automatic analysis of texts

formats and languages

Input
  • languagesGerman
  • text/plainplain text file
Output
  • text/tab-separated-valuestabular data, tab-separated values

application type

Web-UI

Datenblatt (Fact sheet)

contact

version

v1.0

application category

Research Software

application subcategory

Dependency Parsing

research activity

Annotating, Parsing

data communication encryption

unknown

privacy policy

unknown

authentication

no authentication

Creators

D4Science staff

hoster

Pisa, Italy

usage restrictions for individual users

academic usage only

countries supported

all
Spacy (hosted by D4Science) - EN Annotating: English
Stateproduction

This is an experimental integration of a D4Science NLP processing service (spaCy). This service identifies performs dependency parsing for plain English text. For more information on spaCy, see https://spacy.io.

short description

documentation

Description of the target group and its size

Scholars doing automatic analysis of texts

formats and languages

Input
  • languagesEnglish
  • text/plainplain text file
Output
  • text/tab-separated-valuestabular data, tab-separated values

application type

Web-UI

Datenblatt (Fact sheet)

contact

version

v1.0

application category

Research Software

application subcategory

Dependency Parsing

research activity

Annotating, Parsing

data communication encryption

unknown

privacy policy

unknown

authentication

no authentication

Creators

D4Science staff

hoster

Pisa, Italy

usage restrictions for individual users

academic usage only

countries supported

all
Spatial Identifying: Polish
Stateproduction

Recognition of spatial expressions in Polish texts.

short description

documentation

Description of the target group and its size

Scholars doing automatic analysis of texts

formats and languages

Input
  • languagesPolish
  • text/plainplain text file
  • application/mswordMicrosoft Word file
  • application/vnd.openxmlformats-officedocument.wordprocessingml.documentMicrosoft OpenXML word processing file (Word)
  • application/vnd.openxmlformats-officedocument.presentationml.presentationMicrosoft OpenXML presentation file (PowerPoint)
  • application/vnd.openxmlformats-officedocument.spreadsheetml.sheetMicrosoft OpenXML spreadsheet file (Excel)
  • application/vnd.oasis.opendocument.textOpenDocument Text file
  • application/pdfAdobe PDF file
  • text/htmlHTML file
  • text/rtfWord Processing File in the Rich Text Format
Output
  • application/jsonJSON data

application type

Web-UI

Datenblatt (Fact sheet)

contact

version

1.0

application category

Research Software

application subcategory

Spatial expression detection

research activity

Identifying

data communication encryption

unknown

privacy policy

unknown

authentication

no authentication

Creators

Clarin-PL

hoster

Wrocław, Poland

usage restrictions for individual users

academic usage only

countries supported

all
Spejd Annotating: Polish
Stateproduction

Spejd - a partial, shallow parser for Polish with rule-based morphosyntactic disambiguation.

short description

documentation

Description of the target group and its size

Scholars doing automatic analysis of texts

formats and languages

Input
  • languagesPolish
  • text/plainplain text file
  • application/mswordMicrosoft Word file
  • application/vnd.openxmlformats-officedocument.wordprocessingml.documentMicrosoft OpenXML word processing file (Word)
  • application/vnd.openxmlformats-officedocument.presentationml.presentationMicrosoft OpenXML presentation file (PowerPoint)
  • application/vnd.openxmlformats-officedocument.spreadsheetml.sheetMicrosoft OpenXML spreadsheet file (Excel)
  • application/vnd.oasis.opendocument.textOpenDocument Text file
  • application/pdfAdobe PDF file
  • text/htmlHTML file
  • text/rtfWord Processing File in the Rich Text Format
Output
  • application/xmlXTML file

application type

Web-UI

Datenblatt (Fact sheet)

contact

version

1.0

application category

Research Software

application subcategory

Shallow Parsing

research activity

Annotating, Parsing

data communication encryption

unknown

privacy policy

unknown

authentication

no authentication

Creators

Clarin-PL

hoster

Wrocław, Poland

usage restrictions for individual users

academic usage only

countries supported

all
Spreek2Schrijf (Flemish input HTML (very specific formatting, not just any HTML)) Speech Recognizing: Dutch
Stateproduction

Deze webservice gebruikt spraakherkenning om opnamen in de Tweede Kamer om te zetten in een spraaktranscriptie, en een vertaalengine om deze vervolgens naar schrijftaal om te zetten.

short description

documentation

Description of the target group and its size

Scholars doing automatic analysis of texts

licences

GNU Public License v3

formats and languages

Input
  • languagesDutch
  • text/htmlHTML file
Output
  • text/htmlHTML file
  • text/plainplain text file

application type

Web-UI

Datenblatt (Fact sheet)

contact

version

application category

Research Software

application subcategory

Speech Recognition

research activity

Speech Recognizing, Transcribing

data communication encryption

unknown

privacy policy

unknown

authentication

Unknown

authentication requirements

yes. Before tool use, Please register at https://webservices-lst.science.ru.nl/register/

Creators

Maarten van Gompel, Louis ten Bosch (Centre for Language and Speech Technology, Radboud University)

hoster

Centre for Language and Speech Technology, Radboud University

usage restrictions for individual users

academic usage only

countries supported

all
Spreek2Schrijf (MP3 file) Speech Recognizing: Dutch
Stateproduction

Deze webservice gebruikt spraakherkenning om opnamen in de Tweede Kamer om te zetten in een spraaktranscriptie, en een vertaalengine om deze vervolgens naar schrijftaal om te zetten.

short description

documentation

Description of the target group and its size

Scholars doing automatic analysis of texts

licences

GNU Public License v3

formats and languages

Input
  • languagesDutch
  • audio/mpeg
Output
  • text/plainplain text file
  • text/xmlXML file

application type

Web-UI

Datenblatt (Fact sheet)

contact

version

application category

Research Software

application subcategory

Speech Recognition

research activity

Speech Recognizing, Transcribing

data communication encryption

unknown

privacy policy

unknown

authentication

Unknown

authentication requirements

yes. Before tool use, Please register at https://webservices-lst.science.ru.nl/register/

Creators

Maarten van Gompel, Louis ten Bosch (Centre for Language and Speech Technology, Radboud University)

hoster

Centre for Language and Speech Technology, Radboud University

usage restrictions for individual users

academic usage only

countries supported

all
Spreek2Schrijf (Ogg file) Speech Recognizing: Dutch
Stateproduction

Deze webservice gebruikt spraakherkenning om opnamen in de Tweede Kamer om te zetten in een spraaktranscriptie, en een vertaalengine om deze vervolgens naar schrijftaal om te zetten.

short description

documentation

Description of the target group and its size

Scholars doing automatic analysis of texts

licences

GNU Public License v3

formats and languages

Input
  • languagesDutch
  • audio/vorbis
Output
  • text/plainplain text file
  • text/xmlXML file

application type

Web-UI

Datenblatt (Fact sheet)

contact

version

application category

Research Software

application subcategory

Speech Recognition

research activity

Speech Recognizing, Transcribing

data communication encryption

unknown

privacy policy

unknown

authentication

Unknown

authentication requirements

yes. Before tool use, Please register at https://webservices-lst.science.ru.nl/register/

Creators

Maarten van Gompel, Louis ten Bosch (Centre for Language and Speech Technology, Radboud University)

hoster

Centre for Language and Speech Technology, Radboud University

usage restrictions for individual users

academic usage only

countries supported

all
Spreek2Schrijf (Time Marked Conversation (CTM) with punctuation) Speech Recognizing: Dutch
Stateproduction

Deze webservice gebruikt spraakherkenning om opnamen in de Tweede Kamer om te zetten in een spraaktranscriptie, en een vertaalengine om deze vervolgens naar schrijftaal om te zetten.

short description

documentation

Description of the target group and its size

Scholars doing automatic analysis of texts

licences

GNU Public License v3

formats and languages

Input
  • languagesDutch
  • text/xmlXML file
Output
  • text/plainplain text file
  • text/xmlXML file

application type

Web-UI

Datenblatt (Fact sheet)

contact

version

application category

Research Software

application subcategory

Speech Recognition

research activity

Speech Recognizing, Transcribing

data communication encryption

unknown

privacy policy

unknown

authentication

Unknown

authentication requirements

yes. Before tool use, Please register at https://webservices-lst.science.ru.nl/register/

Creators

Maarten van Gompel, Louis ten Bosch (Centre for Language and Speech Technology, Radboud University)

hoster

Centre for Language and Speech Technology, Radboud University

usage restrictions for individual users

academic usage only

countries supported

all
Spreek2Schrijf (Wav file) Speech Recognizing: Dutch
Stateproduction

Deze webservice gebruikt spraakherkenning om opnamen in de Tweede Kamer om te zetten in een spraaktranscriptie, en een vertaalengine om deze vervolgens naar schrijftaal om te zetten.

short description

documentation

Description of the target group and its size

Scholars doing automatic analysis of texts

licences

GNU Public License v3

formats and languages

Input
  • languagesDutch
  • audio/vnd.wave
Output
  • text/plainplain text file
  • text/xmlXML file

application type

Web-UI

Datenblatt (Fact sheet)

contact

version

application category

Research Software

application subcategory

Speech Recognition

research activity

Speech Recognizing, Transcribing

data communication encryption

unknown

privacy policy

unknown

authentication

Unknown

authentication requirements

yes. Before tool use, Please register at https://webservices-lst.science.ru.nl/register/

Creators

Maarten van Gompel, Louis ten Bosch (Centre for Language and Speech Technology, Radboud University)

hoster

Centre for Language and Speech Technology, Radboud University

usage restrictions for individual users

academic usage only

countries supported

all
Summarize Analyzing: Polish
Stateproduction

Automated word graph based summarisation of Polish texts.

short description

documentation

Description of the target group and its size

Scholars doing automatic analysis of texts

formats and languages

Input
  • languagesPolish
  • text/plainplain text file
  • application/mswordMicrosoft Word file
  • application/vnd.openxmlformats-officedocument.wordprocessingml.documentMicrosoft OpenXML word processing file (Word)
  • application/vnd.openxmlformats-officedocument.presentationml.presentationMicrosoft OpenXML presentation file (PowerPoint)
  • application/vnd.openxmlformats-officedocument.spreadsheetml.sheetMicrosoft OpenXML spreadsheet file (Excel)
  • application/vnd.oasis.opendocument.textOpenDocument Text file
  • application/pdfAdobe PDF file
  • text/htmlHTML file
  • text/rtfWord Processing File in the Rich Text Format
Output
  • application/octet-streamarbitrary binary data

application type

Web-UI

Datenblatt (Fact sheet)

contact

version

1.0

application category

Research Software

application subcategory

Text Summarization

research activity

Analyzing, Interpreting

data communication encryption

unknown

privacy policy

unknown

authentication

no authentication

Creators

Clarin-PL

hoster

Wrocław, Poland

usage restrictions for individual users

academic usage only

countries supported

all
T-scan (Text Input) Analyzing: Dutch
Stateproduction

T-Scan is an analysis tool for dutch texts to assess the complexity of the text, and is based on original work by Rogier Kraf (Utrecht University) (see: Kraf et al., 2009). The code has been reimplemented and extended by Ko van der Sloot (Tilburg University), and is currently maintained and continued by Martijn van der Klis (Utrecht University).

short description

documentation

Description of the target group and its size

Scholars doing automatic analysis of texts

licences

GNU Affero General Public License v3

formats and languages

Input
  • languagesDutch
  • text/plainplain text file
Output
  • application/xslt+xml
  • text/csvtabular data, comma-separated values
  • text/plainplain text file
  • text/xmlXML file

application type

Web-UI

Datenblatt (Fact sheet)

contact

version

application category

Research Software

application subcategory

Text Analytics

research activity

Analyzing

data communication encryption

unknown

privacy policy

unknown

authentication

Unknown

authentication requirements

yes. Before tool use, Please register at https://webservices-lst.science.ru.nl/register/

Creators

Ko van der Sloot, Martijn van der Klis, Maarten van Gompel (Utrecht University)

hoster

Utrecht University

usage restrictions for individual users

academic usage only

countries supported

all
T-scan Analyzing: Dutch
Statedevelopment

T-Scan is a new tool for analyzing Dutch text. It aims at extracting text features that are theoretically interesting, in that they relate to genre and text complexity, as well as practically interesting, in that they enable users and text producers to make text-specific diagnoses. T-Scan derives it features from tools such as Frog and Alpino, and resources such as SoNaR, SUBTLEX-NL and Referentie Bestand Nederlands.

short description

documentation

Description of the target group and its size

Scholars doing automatic analysis of texts

licences

GNU Affero General Public License v3

formats and languages

Input
  • languagesDutch
  • text/plainplain text file
Output
  • text/folia+xmlFormat for Linguistic Annotation (FoLiA) file
  • text/xslXSLT Stylesheet
  • text/csvtabular data, comma-separated values

application type

Web-UI

Datenblatt (Fact sheet)

contact

version

application category

Research Software

application subcategory

Text Analytics

research activity

Analyzing

data communication encryption

unknown

privacy policy

unknown

authentication

Proprietary

authentication requirements

Yes. Before tool use, please register at https://webservices-lst.science.ru.nl/register.

Creators

Maarten van Gompel, Ko van der Sloot (CLST, Radboud University Nijmegen), Martijn van der Klis (Utrecht University)

hoster

Nijmegen, The Netherlands (CLAM Webservices)

usage restrictions for individual users

academic usage only

countries supported

all
Tagger NLTK Annotating: English
Stateproduction

Morphological Analysis for English texts.

short description

documentation

Description of the target group and its size

Scholars doing automatic analysis of texts

formats and languages

Input
  • languagesEnglish
  • text/plainplain text file
  • application/mswordMicrosoft Word file
  • application/vnd.openxmlformats-officedocument.wordprocessingml.documentMicrosoft OpenXML word processing file (Word)
  • application/vnd.openxmlformats-officedocument.presentationml.presentationMicrosoft OpenXML presentation file (PowerPoint)
  • application/vnd.openxmlformats-officedocument.spreadsheetml.sheetMicrosoft OpenXML spreadsheet file (Excel)
  • application/vnd.oasis.opendocument.textOpenDocument Text file
  • application/pdfAdobe PDF file
  • text/rtfWord Processing File in the Rich Text Format
Output
  • application/xmlXTML file

application type

Web-UI

Datenblatt (Fact sheet)

contact

version

1.0

application category

Research Software

application subcategory

Morphological Analysis

research activity

Annotating

data communication encryption

unknown

privacy policy

unknown

authentication

no authentication

Creators

Clarin-PL

hoster

Wrocław, Poland

usage restrictions for individual users

academic usage only

countries supported

all
TEILicht-align Collating
Stateproduction
key wordsForced Alignment, Speech, Spoken Language, Transcription

RESTful webservices for transcriptions of spoken data following the TEI guidelines. In principle, target documents are those conforming to the ISO standard ISO 24624:2016(E) Language resource management – Transcription of spoken language. The services are built on the library teispeechtools ; the source code of the services is available on GitHub. Currently, we offer:

align: Pseudo-alignment using Phonetic Transcription or Orthographic Information

short description

documentation

Description of the target group and its size

Scholars working with transcriptions of spoken language

formats and languages

Input
  • application/tei+xml; format-variant=tei-iso-spokenISO-24624-compliant transcription of spoken language
  • application/tei+xmlTEI-P5-compliant XML
Output
  • application/tei+xml; format-variant=tei-iso-spokenISO-24624-compliant transcription of spoken language
  • application/tei+xmlTEI-P5-compliant XML

application type

REST-UI

developer documentation

Datenblatt (Fact sheet)

contact

version

application category

Research Software

application subcategory

Text Enhancement

research activity

Collating

source code available

data communication encryption

(not applicable)

privacy policy

(not applicable)

authentication

no authentication

Creators

  • Bernhard Fisseni (Developer)
  • Thomas Schmidt (Developer)

hoster

Leibniz-Institut für Deutsche Sprache, Mannheim, Germany

usage restrictions for individual users

TEILicht-guess Analyzing
Stateproduction
key wordsCleanup, Speech, Spoken Language, Transcription

RESTful webservices for transcriptions of spoken data following the TEI guidelines. In principle, target documents are those conforming to the ISO standard ISO 24624:2016(E) Language resource management – Transcription of spoken language. The services are built on the library teispeechtools ; the source code of the services is available on GitHub. Currently, we offer:

guess: language-detection

short description

documentation

Description of the target group and its size

Scholars working with transcriptions of spoken language

formats and languages

Input
  • application/tei+xml; format-variant=tei-iso-spokenISO-24624-compliant transcription of spoken language
  • application/tei+xmlTEI-P5-compliant XML
Output
  • application/tei+xml; format-variant=tei-iso-spokenISO-24624-compliant transcription of spoken language
  • application/tei+xmlTEI-P5-compliant XML

application type

REST-UI

developer documentation

Datenblatt (Fact sheet)

contact

version

application category

Research Software

application subcategory

Language Detection

research activity

Analyzing

source code available

data communication encryption

(not applicable)

privacy policy

(not applicable)

authentication

no authentication

Creators

  • Bernhard Fisseni (Developer)
  • Thomas Schmidt (Developer)

hoster

Leibniz-Institut für Deutsche Sprache, Mannheim, Germany

usage restrictions for individual users

TEILicht-identify Identifying
Stateproduction
key wordsXML ID, Speech, Spoken Language, Transcription

RESTful webservices for transcriptions of spoken data following the TEI guidelines. In principle, target documents are those conforming to the ISO standard ISO 24624:2016(E) Language resource management – Transcription of spoken language. The services are built on the library teispeechtools ; the source code of the services is available on GitHub. Currently, we offer:

identify adding and removing XML IDs

short description

documentation

Description of the target group and its size

Scholars working with transcriptions of spoken language

formats and languages

Input
  • application/tei+xml; format-variant=tei-iso-spokenISO-24624-compliant transcription of spoken language
  • application/tei+xmlTEI-P5-compliant XML
Output
  • application/tei+xml; format-variant=tei-iso-spokenISO-24624-compliant transcription of spoken language
  • application/tei+xmlTEI-P5-compliant XML

application type

REST-UI

developer documentation

Datenblatt (Fact sheet)

contact

version

application category

Research Software

application subcategory

other

research activity

Identifying

source code available

data communication encryption

(not applicable)

privacy policy

(not applicable)

authentication

no authentication

Creators

  • Bernhard Fisseni (Developer)
  • Thomas Schmidt (Developer)

hoster

Leibniz-Institut für Deutsche Sprache, Mannheim, Germany

usage restrictions for individual users

TEILicht-normalize Converting
Stateproduction
key wordsSpeech, Spoken Language, Transcription, Orthographic Normalization

RESTful webservices for transcriptions of spoken data following the TEI guidelines. In principle, target documents are those conforming to the ISO standard ISO 24624:2016(E) Language resource management – Transcription of spoken language. The services are built on the library teispeechtools ; the source code of the services is available on GitHub. Currently, we offer:

normalize: OrthoNormal-like Normalization of orthography

short description

documentation

Description of the target group and its size

Scholars working with transcriptions of spoken language

formats and languages

Input
  • application/tei+xml; format-variant=tei-iso-spokenISO-24624-compliant transcription of spoken language
  • application/tei+xmlTEI-P5-compliant XML
Output
  • application/tei+xml; format-variant=tei-iso-spokenISO-24624-compliant transcription of spoken language
  • application/tei+xmlTEI-P5-compliant XML

application type

REST-UI

developer documentation

Datenblatt (Fact sheet)

contact

version

application category

Research Software

application subcategory

Orthographic Normalization

research activity

Converting

source code available

data communication encryption

(not applicable)

privacy policy

(not applicable)

authentication

no authentication

Creators

  • Bernhard Fisseni (Developer)
  • Thomas Schmidt (Developer)

hoster

Leibniz-Institut für Deutsche Sprache, Mannheim, Germany

usage restrictions for individual users

TEILicht-pos Annotating
Stateproduction
key wordsSpeech, Spoken Language, Transcription

RESTful webservices for transcriptions of spoken data following the TEI guidelines. In principle, target documents are those conforming to the ISO standard ISO 24624:2016(E) Language resource management – Transcription of spoken language. The services are built on the library teispeechtools ; the source code of the services is available on GitHub. Currently, we offer:

pos: POS-Tagging with the TreeTagger

short description

documentation

Description of the target group and its size

Scholars working with transcriptions of spoken language

formats and languages

Input
  • application/tei+xml; format-variant=tei-iso-spokenISO-24624-compliant transcription of spoken language
  • application/tei+xmlTEI-P5-compliant XML
Output
  • application/tei+xml; format-variant=tei-iso-spokenISO-24624-compliant transcription of spoken language
  • application/tei+xmlTEI-P5-compliant XML

application type

REST-UI

developer documentation

Datenblatt (Fact sheet)

contact

version

application category

Research Software

application subcategory

Part-Of-Speech Tagging

research activity

Annotating

source code available

data communication encryption

(not applicable)

privacy policy

(not applicable)

authentication

no authentication

Creators

  • Bernhard Fisseni (Developer)
  • Thomas Schmidt (Developer)

hoster

Leibniz-Institut für Deutsche Sprache, Mannheim, Germany

usage restrictions for individual users

TEILicht-segmentize Converting
Stateproduction
key wordsCleanup, Speech, Spoken Language, Transcription

RESTful webservices for transcriptions of spoken data following the TEI guidelines. In principle, target documents are those conforming to the ISO standard ISO 24624:2016(E) Language resource management – Transcription of spoken language. The services are built on the library teispeechtools ; the source code of the services is available on GitHub. Currently, we offer:

segmentize: segmentation according to transcription conventions

short description

documentation

Description of the target group and its size

Scholars converting simple transcriptions to TEI/ISO-conformant transcriptions

formats and languages

Input
  • application/tei+xml; format-variant=tei-iso-spokenISO-24624-compliant transcription of spoken language
  • application/tei+xmlTEI-P5-compliant XML
Output
  • application/tei+xml; format-variant=tei-iso-spokenISO-24624-compliant transcription of spoken language
  • application/tei+xmlTEI-P5-compliant XML

application type

REST-UI

developer documentation

Datenblatt (Fact sheet)

contact

version

application category

Research Software

application subcategory

Conversion

research activity

Converting

source code available

data communication encryption

other

privacy policy

other

authentication

no authentication

Creators

  • Bernhard Fisseni (Developer)
  • Thomas Schmidt (Developer)

hoster

Leibniz-Institut für Deutsche Sprache, Mannheim, Germany

usage restrictions for individual users

TEILicht-text2iso Converting
Stateproduction
key wordsCleanup, Speech, Spoken Language, Transcription

RESTful webservices for transcriptions of spoken data following the TEI guidelines. In principle, target documents are those conforming to the ISO standard ISO 24624:2016(E) Language resource management – Transcription of spoken language. The services are built on the library teispeechtools ; the source code of the services is available on GitHub. Currently, we offer:

text2iso: converting plain text in Simple EXMARaLDA format to ISO-TEI-annotated texts

short description

documentation

Description of the target group and its size

Scholars converting simple transcriptions to TEI/ISO-conformant transcriptions

formats and languages

Input
  • application/plain; format-variant=exmaraldaSimple EXMARaLDA transcription
  • application/tei+xmlTEI-P5-compliant XML
Output
  • application/tei+xml; format-variant=tei-iso-spokenISO-24624-compliant transcription of spoken language
  • application/tei+xmlTEI-P5-compliant XML

application type

REST-UI

developer documentation

Datenblatt (Fact sheet)

contact

version

application category

Research Software

application subcategory

Conversion

research activity

Converting

source code available

data communication encryption

(not applicable)

privacy policy

(not applicable)

authentication

no authentication

Creators

  • Bernhard Fisseni (Developer)
  • Thomas Schmidt (Developer)

hoster

Leibniz-Institut für Deutsche Sprache, Mannheim, Germany

usage restrictions for individual users

TEILicht-text2seg Converting
Stateproduction
key wordsCleanup, Speech, Spoken Language, Transcription

RESTful webservices for transcriptions of spoken data following the TEI guidelines. In principle, target documents are those conforming to the ISO standard ISO 24624:2016(E) Language resource management – Transcription of spoken language. The services are built on the library teispeechtools ; the source code of the services is available on GitHub. Currently, we offer:

text2seg: converting plain text in Simple EXMARaLDA format to ISO-TEI-annotated texts, combined with segmentation according to transcription standards

short description

documentation

Description of the target group and its size

Scholars converting simple transcriptions to TEI/ISO-conformant transcriptions

licences

unknown

formats and languages

Input
  • application/plain; format-variant=exmaraldaSimple EXMARaLDA transcription
Output
  • application/tei+xml; format-variant=tei-iso-spokenISO-24624-compliant transcription of spoken language

application type

REST-UI

developer documentation

Datenblatt (Fact sheet)

contact

version

application category

Research Software

application subcategory

Conversion

research activity

Converting

source code available

data communication encryption

(not applicable)

privacy policy

(not applicable)

authentication

no authentication

Creators

  • Bernhard Fisseni (Developer)
  • Thomas Schmidt (Developer)

hoster

Leibniz-Institut für Deutsche Sprache, Mannheim, Germany

usage restrictions for individual users

TEILicht-unidentify Identifying
Stateproduction
key wordsXML ID, Speech, Spoken Language, Transcription

RESTful webservices for transcriptions of spoken data following the TEI guidelines. In principle, target documents are those conforming to the ISO standard ISO 24624:2016(E) Language resource management – Transcription of spoken language. The services are built on the library teispeechtools ; the source code of the services is available on GitHub. Currently, we offer:

unidentify: removing XML IDs

short description

documentation

Description of the target group and its size

Scholars working with transcriptions of spoken language

formats and languages

Input
  • support for multilingual documents
  • accepts any language
  • application/tei+xml; format-variant=tei-iso-spokenISO-24624-compliant transcription of spoken language
  • application/tei+xmlTEI-P5-compliant XML
Output
  • application/tei+xml; format-variant=tei-iso-spokenISO-24624-compliant transcription of spoken language
  • application/tei+xmlTEI-P5-compliant XML

application type

REST-UI

developer documentation

Datenblatt (Fact sheet)

contact

version

application category

Research Software

application subcategory

Language Detection

research activity

Identifying

source code available

data communication encryption

(not applicable)

privacy policy

(not applicable)

authentication

no authentication

Creators

  • Bernhard Fisseni (Developer)
  • Thomas Schmidt (Developer)

hoster

Leibniz-Institut für Deutsche Sprache, Mannheim, Germany

usage restrictions for individual users

TermoPL Extracting: Polish
Stateproduction

TermoPL is a tool for automated extraction of terminology from Polish texts.

short description

documentation

Description of the target group and its size

Scholars doing automatic analysis of texts

formats and languages

Input
  • languagesPolish
  • text/plainplain text file
  • application/mswordMicrosoft Word file
  • application/vnd.openxmlformats-officedocument.wordprocessingml.documentMicrosoft OpenXML word processing file (Word)
  • application/vnd.openxmlformats-officedocument.presentationml.presentationMicrosoft OpenXML presentation file (PowerPoint)
  • application/vnd.openxmlformats-officedocument.spreadsheetml.sheetMicrosoft OpenXML spreadsheet file (Excel)
  • application/vnd.oasis.opendocument.textOpenDocument Text file
  • application/pdfAdobe PDF file
  • text/htmlHTML file
  • text/rtfWord Processing File in the Rich Text Format
Output
  • application/jsonJSON data

application type

Web-UI

Datenblatt (Fact sheet)

contact

version

1.0

application category

Research Software

application subcategory

Terminology Extraction

research activity

Extracting

data communication encryption

unknown

privacy policy

unknown

authentication

no authentication

Creators

Clarin-PL

hoster

Wrocław, Poland

usage restrictions for individual users

academic usage only

countries supported

all
TextGrid Laboratory Editing: German, English
Stateproduction
key wordsCorpus and Digital Library of TextGrid, Open Archival Information System (OAIS)

With the TextGridLab, a free software package, you can access tools and services to create, manage and edit research data. The open source software is the entry point to the virtual research environment. It is available for Windows, Mac OS X and Linux and provides differentiated access rights management within the protected research environment. The TextGridLab is optimised for XML/TEI development, e.g. in the context of digital editions.

		   **TextGridLab** features include, for instance:
  • Editor for text and XML with WYSIWYG functionality - Integrated unicode character table from the Unicode character set

  • A Text-Image-Link Editor - The Dictionary Search Tool - The note editor MEISE.

      		The infrastructure include powerful Project and User Management, Project Browser
      		/Navigator, Search Tool, Metadata Editor Aggregation Composer, Import/Export Tool, revisions and collection publication (in the repository) supported by an automated metadata validation.
    

TextGridLab is used by German researchers in different research networks and edition projects, such as:

  • hybrid edition of Theodor Fontane's notebooks (Fontane Research Centre of the University of Göttingen) - text database and dictionary of classical Maya (University of Bonn) - the Library of Neology (University of Münster).

      		(see https://textgrid.de/en/web/guest/kooperationsprojekte)
    

TextGrid Lab TextGrid was a project of ten partners, funded by the German Federal Ministry of Education and Research (BMBF) for the period from June 2012 to May 2015 (reference number: 01UG1203A). Since 2016, TextGrid is part of the DARIAH-DE Research Infrastructure.

short description

documentation

Description of the target group and its size

Scholars in the humanities who wants to editing, storing and publishing their data in a sustainable environment.

formats and languages

Input
  • languagesGerman, English
  • text/plainplain text file
  • application/xmlXTML file
  • image/tiff
Output

application type

Desktop cross-platform support

network and security requirements

  • processor32 / 64 bit
  • operating systemWindows, macOS, Linux, Linux
  • runtimeEnvironmentJava Runtime Environment, JRE Version 6
  • installation licensehttps://textgrid.liferay.de.dariah.eu/en/web/guest/terms-of-use

developer documentation

Datenblatt (Fact sheet)

contact

version

application category

(not applicable)

application subcategory

Editing

research activity

Editing, Storing, Archiving

source code available

authentication

DFN Identity Provider

Creators

hoster

  • SUB, Göttingen Germany
  • GWDG, Göttingen Germany

part of an application suite

TextGrid

usage restrictions for individual users

academic usage only

countries supported

all
TextGrid Repository Portal Archiving
Stateproduction
key wordsVirtual research environment, Digital scholarly editing, Editions, Editionen

The TextGrid repository is a long-term archive for research data in the humanities. It provides a comprehensive, searchable and re-usable stock of texts and images. The TextGridRepository 2020 is based on the principles of Open Access and the FAIR principles and has been awarded the CoreTrustSeal. For researchers, the TextGrid Repository offers a sustainable, permanent and secure possibility to publish their research data in a citable manner and to describe them in a comprehensible way by means of required metadata. More about sustainability, FAIR and Open Access in the TextGrid Repository's mission statement.

documentation

Description of the target group and its size

Scholars in the humanities who wants to editing, storing and publishing their data in a sustainable environment.

formats and languages

Output
  • application/xml+tei, Schema
  • text/plainplain text file
  • application/epub+zip
  • text/htmlHTML file
  • application/zipzip archive

Localization

German, English

application type

Web-UI

developer documentation

Datenblatt (Fact sheet)

contact

version

application category

(not applicable)

application subcategory

Publishing

research activity

Archiving, Publishing

source code available

authentication

Shibboleth academic login DFN Identity Provider eduGAIN

Creators

hoster

  • Göttingen State and University Library (SUB), Göttingen Germany
  • Gesellschaft für wissenschaftliche Datenverarbeitung mbH Göttingen (GWDG), Göttingen Germany

usage restrictions for individual users

public

countries supported

all
TF-IDF Analyzing: Polish
Stateproduction

TF, IDF, TF-IDF calculation.

short description

documentation

Description of the target group and its size

Scholars doing automatic analysis of texts

formats and languages

Input
  • languagesPolish
  • text/plainplain text file
  • application/mswordMicrosoft Word file
  • application/vnd.openxmlformats-officedocument.wordprocessingml.documentMicrosoft OpenXML word processing file (Word)
  • application/vnd.openxmlformats-officedocument.presentationml.presentationMicrosoft OpenXML presentation file (PowerPoint)
  • application/vnd.openxmlformats-officedocument.spreadsheetml.sheetMicrosoft OpenXML spreadsheet file (Excel)
  • application/vnd.oasis.opendocument.textOpenDocument Text file
  • application/pdfAdobe PDF file
  • text/htmlHTML file
  • text/rtfWord Processing File in the Rich Text Format
Output
  • text/csvtabular data, comma-separated values

application type

Web-UI

Datenblatt (Fact sheet)

contact

version

1.0

application category

Research Software

application subcategory

TF/IDF/TF-IDF calculation

research activity

Analyzing

data communication encryption

unknown

privacy policy

unknown

authentication

no authentication

Creators

Clarin-PL

hoster

Wrocław, Poland

usage restrictions for individual users

academic usage only

countries supported

all
Topic Analyzing: Polish
Stateproduction

Topic modelling of texts in Polish. The tools used include: Morfeusz 2 with SGJP dictionary (for morphological analysis), wcrft2 (for tagging), gensim and mallet (for topic modelling), and D3.js plus D3-tip (for result visualisation).

short description

documentation

Description of the target group and its size

Scholars doing automatic analysis of texts

formats and languages

Input
  • languagesPolish
  • application/zipzip archive
Output
  • application/octet-streamarbitrary binary data

application type

Web-UI

Datenblatt (Fact sheet)

contact

version

1.0

application category

Research Software

application subcategory

Topic Modelling

research activity

Analyzing, Modeling

data communication encryption

unknown

privacy policy

unknown

authentication

no authentication

Creators

Clarin-PL