This is a project to encode an arbitrary video into visual data, for transfer, and to be decoded on the other end into the original file, with optimal file integrity through compression.
Find a file
2024-11-27 17:32:20 -05:00
archivedata init 2022-09-02 12:09:27 -04:00
in missing folders 2022-09-02 12:21:15 -04:00
model missing folders 2022-09-02 12:21:15 -04:00
out missing folders 2022-09-02 12:21:15 -04:00
outvideos missing folders 2022-09-02 12:21:15 -04:00
trainingdata missing folders 2022-09-02 12:21:15 -04:00
trainingstats missing folders 2022-09-02 12:21:15 -04:00
.gitkeep missing folders 2022-09-02 12:21:15 -04:00
bestpallet.plte init 2022-09-02 12:09:27 -04:00
cl.nim init 2022-09-02 12:09:27 -04:00
cl.nims init 2022-09-02 12:09:27 -04:00
compile.sh init 2022-09-02 12:09:27 -04:00
encodeLDPC.sh init 2022-09-02 12:09:27 -04:00
implementation.nim init 2022-09-02 12:09:27 -04:00
install.sh init 2022-09-02 12:09:27 -04:00
README.md updated readme 2024-11-27 17:32:20 -05:00
tensorCeral.nim init 2022-09-02 12:09:27 -04:00
testCerials.sh init 2022-09-02 12:09:27 -04:00
tovideo.sh init 2022-09-02 12:09:27 -04:00
training.nim init 2022-09-02 12:09:27 -04:00
unit-test.nim init 2022-09-02 12:09:27 -04:00

ArbitraryFileVideoEncoder

What is this?

This project is intended to allow you to encode files to videos, and then upload those videos to anywhere, and use them in the same way you would files.

Simple concept, and before encoding, perfectly doable, easily.

Example output:

exampleoutput

Machine Learning

A machine learning training method for reducing transmission corruption was made in training.nim. Due to limitations in arraymancer currently with serialization, the ability to load the model is very buggy and thus not implemented.

Encoding Standard

The encoding is relatively simple right now.

The first 256 blocks are individual colors, they are used as the key for the image, 0 - 255 respectively. Each time an image of that color is identified, it can be referenced with its place in the first 255

This means the data retention is dependent on how distinct each individual color can be between each other, and how well they can be persevered to look like each other

Serialization standard

Currently debug data can be saved into binary files via tensorCeral. The .bin encoding is in tensorCeral.nim and works like this: The first 9 bytes are encoded as 32 bit unsigned int

[Dimension X, Dimension Y, The length of the following array [as a redundancy]] * 3 The following data, until the end of the file is data in bytes, equal to the dimensions first specified.

Usage

  • Implementation

Used to decode a .bin file to an output, has graphing when compiled with different settings, and an interactive pager. To get graph outputs use -d:colordebug -d:graph, and for pager -d:pager note output can be "" or "-" for stdout use ./implementation [input.bin] [output] [originalfile]

for training and cl help, they have built in help messages

  • TensorCeral, its purposes are only used within LDPC

an example usage to test cl encoding is:

./cl -e yourFile.zip outFile.bin bestpallet.plte; ./implementation  outFile.bin yourFileClone.zip

Unit-Tests

Some unit tests have been implemented in unit-test.nim If everything returns positive, then everything should be working

What is bestpallet.plte?

It is the greatest pallet I have generated randomly, used mostly as a reference and a starting ground for all comparisons.

Corruption

Compressing these files then using a simple color comparison algorithm is going to cause transmission data corruption, which can be pretty significant unless you have a good form of data redundancy algorithm.

Currently with the included pellet, we can get around ~2.7% - 6% corruption; That is the pallet i used in the example image. I included some basic statistical sorting, but it only increases corruption, and LDPC codes seem to make it skyrocket.

2.7% Is not enough to maintain most corruption-redundant data format's integrity, and thus, is insufficient to be use in practical purposes.

Future

In the future, I would like to include some very important stuff

  1. R statistical systems, to better identify the incorrect colors
  2. Custom LDPC codes, this is the key
  3. Hand mate color pallets
  4. Refine machine learning into a classification algorithm feedback through implementation

Once we can achieve data corrupts low enough for an archive to survive 'transmission' we can start to look at more complex data structures.

Graphs

The graph output of trainingdata. This graph shows various models at different points in their devolvement, showing their accuracy and efficiencies.

trainingdata

This is a graph output of implementation, showing the inefficiencies of the current systems:

implementation