CoNLL 2021: Reproducibility Material
Try our tokenization repair methods in the interactive web demo.
Evaluation web applications
Click through our benchmarks and get a visualisation of the results in the evaluation web app.
The data contains our benchmarks described in the paper, as well as trained models and predicted sequences from all our methods (1GB compressed). In addition, you can download our training data (6GB compressed).
Corrected ACL anthology corpus
A whitespace-corrected version of the ACL anthology corpus will be made available for download with the publication of our paper. In the meantime, you can explore the corrected corpus below.
The code comes with a Docker setup for easy reproducibility. A readme file in the code directory explains how to setup the Docker container. If you are not familiar with Docker, please visit docker.com.
The Docker container allows you to try our methods interactively, run them on our benchmarks (or on yours!), and run the evaluation. Make targets simplify the program calls and give further explanations.
The latest version is 1.2.1 (June 14, 2021).
arXiv 2020 paper
Here you find the reproducibility material of our arXiv 2020 paper, including benchmarks, results and trained models: Download arXiv 2020 data