SplitticAI leverages the open-source version of GPT-2 to create images and audio purely based on text input and output.
For image generation, the system is trained on character sets that represent the raw textual output of an image file. If you were to open a JPEG in a text editor, you’d see this kind of data. SplitticAI processes this raw text output to generate new image data, which is then interpreted back into a visual image file.
This innovative approach extends to audio output as well. SplitticAI generates an image of a spectrogram, which then gets „played back“ as audio data, showcasing its multimodal capabilities through a text-centric process.
If the creator of SplitticAI happens to read this, I would love to pick your brains in a 1-on-1 discussion.