Zipfs-Law-Language-Detector/execenv
2024-10-13 15:18:16 -04:00
..
.gitignore added hfdownlaoder 2024-10-11 05:33:32 -04:00
buildData.sh more docs 2024-10-13 15:18:16 -04:00
compile.sh release 2024-10-10 22:37:54 -04:00
downloadWikipedia.sh added hfdownlaoder 2024-10-11 05:33:32 -04:00
graph.py Finished README.md 2024-10-13 08:04:35 -04:00
initdbs.sh release 2024-10-10 22:37:54 -04:00
README.md more docs 2024-10-13 15:18:16 -04:00

Instructions

Step 1. Compile

Dependencies

Nim (>= 2.0.6)
Rust & Cargo
sqlite3

For nimble deps:

nimble install db_connector/db_sqlite tiny_sqlite

To init:

./compile.sh
./initdbs.sh

Download wikipedia

Make a folder in ../wikimedia wikipedia, for the wikipedia data. On my machine, its system-linked to /mnt. I imagine many people want to do this, so it's not instantiated by default.

./downloadWikipedia.sh

Build data

./parquet_thing --test_data
./parquet_thing --words
./geneticTraining  --iterations=-20 --output_words=50
./compile.sh
./scoring

This will get the character occurrences, gets the word occurrences, and the test data, trains the words, re-compiles, and finally: scores it.