88 lines
3.9 KiB
Markdown
88 lines
3.9 KiB
Markdown
# ArbitraryFileVideoEncoder
|
||
|
||
# What is this?
|
||
|
||
This project is intended to allow you to encode files to videos, and then upload those videos to anywhere,
|
||
and use them in the same way you would files.
|
||
|
||
Simple concept, and before encoding, perfectly doable, easily.
|
||
|
||
Example output:
|
||
|
||
![exampleoutput](https://cdn.discordapp.com/attachments/1003008044474564618/1015264426414329906/in0.png)
|
||
|
||
# Machine Learning
|
||
|
||
A machine learning training method for reducing transmission corruption was made in training.nim. Due to limitations in arraymancer currently with serialization, the ability to load the model is very buggy and thus not implemented.
|
||
|
||
|
||
# Encoding Standard
|
||
|
||
The encoding is relatively simple right now.
|
||
|
||
The first 256 blocks are individual colors, they are used as the key for the image, 0 - 255 respectively.
|
||
Each time an image of that color is identified, it can be referenced with its place in the first 255
|
||
|
||
This means the data retention is dependent on how distinct each individual color can be between each other, and how well they can be persevered to look like each other
|
||
|
||
# Serialization standard
|
||
|
||
Currently debug data can be saved into binary files via tensorCeral. The .bin encoding is in tensorCeral.nim and works like this:
|
||
The first 9 bytes are encoded as 32 bit unsigned int
|
||
|
||
[Dimension X, Dimension Y, The length of the following array [as a redundancy]] * 3
|
||
The following data, until the end of the file is data in bytes, equal to the dimensions first specified.
|
||
|
||
# Usage
|
||
|
||
- Implementation
|
||
|
||
Used to decode a .bin file to an output, has graphing when compiled with different settings, and an interactive pager. To get graph outputs use -d:colordebug -d:graph, and for pager -d:pager
|
||
note output can be "" or "-" for stdout
|
||
use ./implementation [input.bin] [output] [originalfile]
|
||
|
||
for training and cl help, they have built in help messages
|
||
|
||
- TensorCeral, its purposes are only used within LDPC
|
||
|
||
an example usage to test cl encoding is:
|
||
```bash
|
||
./cl -e yourFile.zip outFile.bin bestpallet.plte; ./implementation outFile.bin yourFileClone.zip
|
||
```
|
||
|
||
# Unit-Tests
|
||
Some unit tests have been implemented in unit-test.nim
|
||
If everything returns positive, then everything should be working
|
||
|
||
|
||
# What is bestpallet.plte?
|
||
It is the greatest pallet I have generated randomly, used mostly as a reference and a starting ground for all comparisons.
|
||
|
||
# Corruption
|
||
|
||
Compressing these files then using a simple color comparison algorithm is going to cause transmission data corruption, which can be pretty significant unless you have a good form of data redundancy algorithm.
|
||
|
||
Currently with the included pellet, we can get around ~2.7% - 6% corruption; That is the pallet i used in the example image. I included some basic statistical sorting, but it only increases corruption, and LDPC codes seem to make it skyrocket.
|
||
|
||
2.7% Is not enough to maintain most corruption-redundant data format's integrity, and thus, is insufficient to be use in practical purposes.
|
||
|
||
# Future
|
||
In the future, I would like to include some very important stuff
|
||
1. R statistical systems, to better identify the incorrect colors
|
||
2. Custom LDPC codes, this is the key
|
||
3. Hand mate color pallets
|
||
4. Refine machine learning into a classification algorithm feedback through implementation
|
||
|
||
Once we can achieve data corrupts low enough for an archive to survive 'transmission' we can start to look at more complex data structures.
|
||
|
||
# Graphs
|
||
|
||
The graph output of trainingdata. This graph shows various models at different points in their devolvement, showing their accuracy and efficiencies.
|
||
|
||
![trainingdata](https://media.discordapp.net/attachments/1003008044474564618/1015256595061555220/visual.png?width=1057&height=561)
|
||
|
||
This is a graph output of implementation, showing the inefficiencies of the current systems:
|
||
|
||
![implementation](https://media.discordapp.net/attachments/1003008044474564618/1015256595485175808/implementation.png?width=1037&height=561)
|
||
|
||
|