Caroline/File-To-Video-Encoder

Fork 0

This is a project to encode an arbitrary video into visual data, for transfer, and to be decoded on the other end into the original file, with optimal file integrity through compression.

Find a file

user 90932b6655 updated readme		2024-11-27 17:32:20 -05:00
archivedata	init	2022-09-02 12:09:27 -04:00
in	missing folders	2022-09-02 12:21:15 -04:00
model	missing folders	2022-09-02 12:21:15 -04:00
out	missing folders	2022-09-02 12:21:15 -04:00
outvideos	missing folders	2022-09-02 12:21:15 -04:00
trainingdata	missing folders	2022-09-02 12:21:15 -04:00
trainingstats	missing folders	2022-09-02 12:21:15 -04:00
.gitkeep	missing folders	2022-09-02 12:21:15 -04:00
bestpallet.plte	init	2022-09-02 12:09:27 -04:00
cl.nim	init	2022-09-02 12:09:27 -04:00
cl.nims	init	2022-09-02 12:09:27 -04:00
compile.sh	init	2022-09-02 12:09:27 -04:00
encodeLDPC.sh	init	2022-09-02 12:09:27 -04:00
implementation.nim	init	2022-09-02 12:09:27 -04:00
install.sh	init	2022-09-02 12:09:27 -04:00
README.md	updated readme	2024-11-27 17:32:20 -05:00
tensorCeral.nim	init	2022-09-02 12:09:27 -04:00
testCerials.sh	init	2022-09-02 12:09:27 -04:00
tovideo.sh	init	2022-09-02 12:09:27 -04:00
training.nim	init	2022-09-02 12:09:27 -04:00
unit-test.nim	init	2022-09-02 12:09:27 -04:00

README.md

ArbitraryFileVideoEncoder

What is this?

This project is intended to allow you to encode files to videos, and then upload those videos to anywhere, and use them in the same way you would files.

Simple concept, and before encoding, perfectly doable, easily.

Example output:

Machine Learning

A machine learning training method for reducing transmission corruption was made in training.nim. Due to limitations in arraymancer currently with serialization, the ability to load the model is very buggy and thus not implemented.

Encoding Standard

The encoding is relatively simple right now.

The first 256 blocks are individual colors, they are used as the key for the image, 0 - 255 respectively. Each time an image of that color is identified, it can be referenced with its place in the first 255

This means the data retention is dependent on how distinct each individual color can be between each other, and how well they can be persevered to look like each other

Serialization standard

Currently debug data can be saved into binary files via tensorCeral. The .bin encoding is in tensorCeral.nim and works like this: The first 9 bytes are encoded as 32 bit unsigned int

[Dimension X, Dimension Y, The length of the following array [as a redundancy]] * 3 The following data, until the end of the file is data in bytes, equal to the dimensions first specified.

Usage

Implementation

Used to decode a .bin file to an output, has graphing when compiled with different settings, and an interactive pager. To get graph outputs use -d:colordebug -d:graph, and for pager -d:pager note output can be "" or "-" for stdout use ./implementation [input.bin] [output] [originalfile]

for training and cl help, they have built in help messages

TensorCeral, its purposes are only used within LDPC

an example usage to test cl encoding is:

./cl -e yourFile.zip outFile.bin bestpallet.plte; ./implementation  outFile.bin　yourFileClone.zip

Unit-Tests

Some unit tests have been implemented in unit-test.nim If everything returns positive, then everything should be working

What is bestpallet.plte?

It is the greatest pallet I have generated randomly, used mostly as a reference and a starting ground for all comparisons.

Corruption

Compressing these files then using a simple color comparison algorithm is going to cause transmission data corruption, which can be pretty significant unless you have a good form of data redundancy algorithm.

Currently with the included pellet, we can get around ~2.7% - 6% corruption; That is the pallet i used in the example image. I included some basic statistical sorting, but it only increases corruption, and LDPC codes seem to make it skyrocket.

2.7% Is not enough to maintain most corruption-redundant data format's integrity, and thus, is insufficient to be use in practical purposes.

Future

In the future, I would like to include some very important stuff

R statistical systems, to better identify the incorrect colors
Custom LDPC codes, this is the key
Hand mate color pallets
Refine machine learning into a classification algorithm feedback through implementation

Once we can achieve data corrupts low enough for an archive to survive 'transmission' we can start to look at more complex data structures.

Graphs

The graph output of trainingdata. This graph shows various models at different points in their devolvement, showing their accuracy and efficiencies.

This is a graph output of implementation, showing the inefficiencies of the current systems:

README.md Unescape Escape