…or how I happened to plunge head-first into the most fascinating technological revolution we have witnessed since the Internet and the smartphone.
Back in my childhood days I had been binge-listening to a kids radio play called „Robbi, Tobbi und das Fliewatüt“ – a story about a kid inventor who went on an adventure with a robot his „age“ to solve three riddles the robot needed answered in order to succeed in „Robot Class“. He distinctively needed a human counterpart his age to solve the three riddles. After I finished the entire 3-part cassette series I was filled with the urge to build a robot myself. A companion with whom I could also go on adventures: to the north pole, the black and yellow striped lighthouse or even explore the secrets within a supposedly haunted castle. This very same week I promised my mom I would invent a robot for her that would help her with house shores so she would have more time for the stuff she would like to do. I would finish my robot before I’d turn 25.
40 years have passed.
Did I build that robot? No. Did I buy that robot. No.
But we still talk about it today and she sometime smiles, when I report to her the innovations coming out of generative AI these days and then telling me it’d be about time I fulfill my early-days promise. Let’s just say I am still working on it but we seem to be still some time off but in secret I honestly hope she’ll get a chance to see “her robot” any time soon now.
As things are currently developing, she even might have a chance.
Let’s recap: last year it seemed the world was going into the Metaverse any time soon. At that time – back in February 2022, I conceptualized a Roblox experience for our Sales & Service department at Deutsche Telekom. Without any designer, not enough time or financial resources I was hard pressed when we were asked to deliver some virtual wallpapers for a building we wanted to build within the confines of “Beatland”, our Telekom residence within Roblox. I was looking to create these digital assets myself but it showed that I was nowhere near a professional designer. It must’ve been around that time when I discovered openAI’s Dalle2 by accident. I knew the company from my regular researches in the web, but hadn’t been aware of them developing something that would kick off my personal AI-journey: there was an empty text field waiting for me to enter some text, a description of what I wanted to see – as an IMAGE!
I was blown away by the prospect of entering text to receive an image based on this description. Being a long-time copywriter in my professional past writing decent copy to convey a precise outcome truly resonated with me. As I started experimenting with Dalle2 I immediately realized the immense potential this technology would have for anyone being able to write a straight sentence. I wanted to do more but then credits ran out. It was just enough to create some decent digital wallpapers for our virtual world in Roblox and our external partner as well as colleagues were quite surprised when I produced these new digital assets. Where did you download them? Are they public domain? Can we legally use these? How much did you pay? The classical questions were mostly unnecessary. According to the ToS, we were allowed to use the results of my prompts (the textual inputs into the image-creating machine) commercially.
But what if I was able to create my own images? On my own computer? Would that even be possible with all the data and AI shenanigans going on in the backend? After all openAI was a huge cloud-operated company – but maybe…?
Just a couple of month later I was able to install “Automatic1111”, a locally run server on my very own Windows PC that enabled me to prompt for my own images without any connection to any API or cloud instance. The first images were rather crude and wouldn’t stand a chance against Dalle2. But just a month later a huge, global community had developed around Automatic1111. There were new extensions, plugins and modules being developed almost by the day. Some were mediocre and quickly pieced together but some were real game-changers. Soon new image-generation models, so-called “Checkpoints”, later called “Safetensors” popped up. While Checkpoints were rather risky to download and install, Safetensors were…well, safer. I started experimenting with any model I could find, experimenting with various styles of models, negative prompts, LoRAS and new add-ons to the base server software.
Watching forums grow and communities thrive proved to me how these developments were catapulting the generative AI scene into the future in an exponential fashion. I was blown away by what people of all walks of life could achieve when working together on a common goal – globally! With the advent of multiple models popping up