Andreas Braun, CTO at Microsoft Germany, has confirmed that GPT-4 is coming within a week of March 9, 2023 and that it will be multimedia. Multimedia AI means that it will be able to work within multiple types of inputs, such as video, images, and audio.
Large multimedia language models
The big takeaway from the announcement is that GPT-4 is multimedia (SEJ expects GPT-4 to be multimedia in January 2023).
A method is a reference to the type of input that (in this case) a large language model is dealing with.
Multimedia can include text, speech, images, and video.
GPT-3 and GPT-3.5 only work in one way, text.
According to a German news report, GPT-4 may be able to operate in at least four ways, image, sound (audio), text and video.
Quoted from Dr. Andreas Braun, CTO Microsoft Germany:
“We’ll be introducing GPT-4 next week, we’ll have multimedia models that offer completely different capabilities – for example videos…”
The reports lack GPT-4 specifics, so it’s unclear if what’s been shared about multimedia is specific to GPT-4 or just in general.
Microsoft Director Business Strategy Holger Kane Explanation of multimodality but the reporting was not clear whether it referred to GPT-4 multimodality or multimodality in genera.
I think his references to multimedia were specific to GPT-4.
Share the news report:
“Ken explained what multimedia artificial intelligence is all about, which can translate not only text into images, but also music and video.”
Another interesting fact is that Microsoft is working on “Confidence measuresIn order to ground their AI with facts to make it more reliable.
Something that seems not to have been reported in the US is that Microsoft released a multimedia language model called Kosmos-1 at the beginning of March 2023.
According to a German news site, hess de:
“…the team subjected the pre-trained model to various tests, and achieved good results on image classification, answering questions about image content, automated image tagging, visual text recognition, and speech generation tasks.
… Visual reasoning, that is, drawing conclusions about images without using language as an intermediate step, seems to be the key here…”
Kosmos-1 is a multimedia model that integrates text and image styles.
GPT-4 goes further than Kosmos-1 because it adds a third method, video, and it also seems to include the audio method.
Works across multiple languages
GPT-4 seems to work across all languages. It is described as being able to receive a question in German and answer it in Italian.
This is an odd example because who would ask a question in German and want an answer in Italian?
This is confirmed:
“…the technology has reached the point that it ‘works in all languages’: you can ask a question in German and get an answer in Italian.
With multimedia, Microsoft (-OpenAI) will “make models universal”. “
I think the point of the hack is that the paradigm transcends language with its ability to pull knowledge across different languages. So if the answer is in Italian, you will know it and be able to give the answer in the language in which the question was asked.
This would make it similar to the goal of Google’s multimedia AI called, MUM. Mom is said to be able to provide answers in English for which data is only available in another language, such as Japanese.
There is no current announcement of where GPT-4 will appear. But Azure-OpenAI was mentioned specifically.
Google is fighting to catch up with Microsoft by integrating competing technology into its search engine. This development exacerbates the perception that Google is lagging behind and lacking leadership in consumer-facing AI.
Google is already integrating AI into many products such as Google Lens, Google Maps, and other areas where consumers interact with Google.
It’s the way Microsoft does it most obviously.
Read the original German report here:
Featured image by Shutterstock / Master1305