122 lines
5.5 KiB
Markdown
122 lines
5.5 KiB
Markdown
![logo](https://albassort.github.io/CDN/miTTenS/miTTenS-light.png)
|
|
# miTTens
|
|
- Art by Sharaa Yippie
|
|
|
|
- For linux, and linux only.
|
|
|
|
This is a application used to control piper with your keyboard. This integrates it with your computer, in a way, where you have more control. It works via your clipboard. This is intended for visually disabled people, such as myself. To aid in both writing, and reading. It has language detection to automatically change the language depending on the text.
|
|
|
|
Piper has a natural sound, and is quite fast to synthesize for the quality it gives. On en_US-danny-low.json this enitre readme took about 9 seconds to synthesize. That is quite fast, however, isn't the use case for the tool. The above paragraph took 0.25s to synthesize.
|
|
|
|
This project's structure is ambigious enough to allow for alternate TTS models to integrated, in the future.
|
|
|
|
# For Developers
|
|
## Structure
|
|
|
|
miTTeNs works over UDP and takes user input over a port. It is event and uses the following packet structure
|
|
|
|
```
|
|
All numbers are encoded in big endian
|
|
---
|
|
| Length (U16) | Command (U8) | Body (Size = Length, ASCII) |
|
|
e.g
|
|
---
|
|
0x00 0x05 | 0x01 | hello
|
|
---
|
|
```
|
|
Or the equivalent nim:
|
|
```nim
|
|
proc createStringForSending(a : string, operation : uint8) : string =
|
|
#This converts the integer to an array of bytes
|
|
# x.high gets the 0-counted length of the string.
|
|
let data = cast[array[sizeof(uint16), char]](a.high.uint16)
|
|
return data.join("") & char(operation) & $a
|
|
```
|
|
And the following commands exist
|
|
|
|
```
|
|
- PLAY_MESSAGE = 1 --- Queues, in order.
|
|
- PLAY_MESSAGE_PRIORITY = 2 --- Interrupts the currently played message, and clears the queue.
|
|
- SET_LENGTH_SCALE = 3 (the body must be float32 big endian)--- Sets the current length scale.
|
|
- STOP_PLAYING = 4 --- Stops playing the current message.
|
|
- RESUME = 5 --- Continues playing the message from where the message was stopped.
|
|
- SET_MODEL = 6 --- Sets the current model to the rowid in the intenral database.
|
|
- INC_MODEL = 7 --- Increases the model to the next available model in the internal database.
|
|
- DEC_MODEL = 8 --- Increases the model to the next available model in the internal database.
|
|
- SCRUB_FORWARD = 9 --- Skips forward in time of the given message. The amount skipped is decided by the interval, set in the database. Controllable by the webcfg frontend.
|
|
- SCRUB_BACKWARDS = 10 --- See 9. Decreases the time.
|
|
- RESTART = 11 --- Sets the time 0, resetting the message.
|
|
- INC_LENGTH_SCALE = 12 --- Slows down the message, by the amount set in the databse, configurable by the webcfg frontend. (requires resyntheis)
|
|
- DEC_LENGTH_SCALE = 13 -- See 13. Speeds up the message. (requires resyntheis)
|
|
```
|
|
# Configuration
|
|
There is a custom file format for the configuration of keybinds.
|
|
Where, on the leftside you have the keybinds, on the right, you have the command.
|
|
|
|
```
|
|
#comment
|
|
Alt_L, Super_L, A -> PLAY_MESSAGE_PRIORITY
|
|
Alt_L, Super_L, X -> STOP_PLAYING
|
|
Alt_L, Super_L, C -> RESTART
|
|
```
|
|
|
|
In addition, there is a webcfg, that should be accessable via screen readers.
|
|
|
|
![Control Panel](https://albassort.github.io/CDN/miTTenS/controlpanel.jpg)
|
|
|
|
## Thread Structure
|
|
|
|
![structure.png](https://albassort.github.io/CDN/miTTenS/structure.png)
|
|
miTTenS follows a event based system for managing state.
|
|
The text in the above immage:
|
|
|
|
- Threads
|
|
- Network Thread: reads from the UDP port specified and sends the raw messages onto the Event Processing Thread.
|
|
|
|
- Event Processing Thread: processes the event, sends multiple events to manage the state of the synthesis thread. Is a separate thread to allow for both playing audio and accepting requests
|
|
|
|
- Synthesis and message playing Thread: Handles the piper process and audio playing. Is manipulated by the other threads. Has to be on the original thread due to memory safety issues in OMNX RUNTIME which piper uses.
|
|
|
|
## Language detection
|
|
|
|
miTTenS supports language detection via a [custom algorithm](https://gitlab.com/IAlbassort/zipfs-law-language-detection) with no complex dependencies. Its accuracy is based upon orthography, and thus, gets more accurate the longer the text is. Individual words are liable to snap back to their origin language (e.g Sauna might become Finnish) if there isn't significant spelling differences. It is disabled by default. Additionally , if a language has a similar orthography, it is less accurate. Ukranian and Russian are likely to be confused, whereas, Finnish and Swedish will seldom be.
|
|
|
|
## Setting up
|
|
|
|
Current system-dependent dependencies are as follows:
|
|
|
|
```
|
|
- cmake >= 13
|
|
- g++ CC >= 13
|
|
- sfml2
|
|
- sqlite3
|
|
- nim (>= 2.0.0)
|
|
```
|
|
|
|
it also requires a user in the input group whom is not root. I recommend you make a specific user for this task.
|
|
|
|
```bash
|
|
sudo usermod -a -G input $user
|
|
```
|
|
---
|
|
```bash
|
|
git clone --recurse-submodules https://gitlab.com/CAlbassort/miTTenS
|
|
cd miTTenS
|
|
./install.sh
|
|
```
|
|
|
|
## Downloading and installing models
|
|
|
|
Models can be downloaded from [here](https://github.com/rhasspy/piper/blob/master/VOICES.md)
|
|
|
|
The install location is /usr/local/miTTenS, and so you will install them there. Specifically, to /usr/local/miTTenS/models/language. The language code you're supposed to use is based on this list (here)[https://gitlab.com/IAlbassort/zipfs-law-language-detection/-/blob/master/data/multiToEng.json?ref_type=heads] (the Wikipedia extension is used). After which, you drop the models in with their omnx and an identical .json file.
|
|
For example:
|
|
```
|
|
/usr/local/miTTenS/models/
|
|
├── en
|
|
│ ├── en_US-danny-low.json
|
|
│ └── en_US-danny-low.onnx
|
|
└── fi
|
|
├── fi_FI-harri-medium.json
|
|
└── fi_FI-harri-medium.onnx
|
|
```
|