Zipfs-Law-Language-Detector/execenv/README.md
2024-10-13 15:18:16 -04:00

749 B

Instructions

Step 1. Compile

Dependencies

Nim (>= 2.0.6)
Rust & Cargo
sqlite3

For nimble deps:

nimble install db_connector/db_sqlite tiny_sqlite

To init:

./compile.sh
./initdbs.sh

Download wikipedia

Make a folder in ../wikimedia wikipedia, for the wikipedia data. On my machine, its system-linked to /mnt. I imagine many people want to do this, so it's not instantiated by default.

./downloadWikipedia.sh

Build data

./parquet_thing --test_data
./parquet_thing --words
./geneticTraining  --iterations=-20 --output_words=50
./compile.sh
./scoring

This will get the character occurrences, gets the word occurrences, and the test data, trains the words, re-compiles, and finally: scores it.