AI Horizons: New Frontiers and Thoughtful Considerations – September 2023


AI peak is not over yet, even if the stock market is not at its maximum, nevertheless the momentum is here to stay for a long time. The big news of the month is probably the fact that chatGPT cha now see, hear, and speak. ChatGPT now supports voice interaction, allowing users to have back-and-forth conversations with the assistant. This feature facilitates on-the-go interactions, such as requesting bedtime stories or settling dinner table debates. The voice capability is powered by a new text-to-speech model that can generate human-like audio from text and a few seconds of sample speech. Five different voices, created in collaboration with professional voice actors, are available to users. Users can opt into voice conversations via the mobile app settings, and then access this feature by tapping the headphone button on the home screen. Users can show ChatGPT one or more images to explore content, troubleshoot issues, or analyze complex data. For example, they can snap pictures of their fridge to discuss meal plans or use images to seek help with math problems. A drawing tool in the mobile app allows users to focus on specific parts of an image when interacting with ChatGPT. Image understanding is powered by multimodal GPT-3.5 and GPT-4 models, which apply language reasoning skills to a wide range of images, including photographs, screenshots, and documents containing both text and images.

The voice technology opens doors to many creative and accessibility-focused applications but also presents new risks like potential impersonation by malicious actors. Vision-based models bring about challenges ranging from hallucinations to interpretation reliability in high-stakes domains, necessitating thorough testing before broader deployment.

Source: OpenAI’s blog.

Another milestone has been set by Microsoft launching the first release of Windows 11 with copilot integrated, this way the OS becomes AI enabled. Honestly this is the first real innovation in Windows since the release of Windows 10. Windows Copilot, brings Bing Chat directly to the Windows 11 desktop. Appearing as a sidebar, it allows users to control PC settings, launch apps, or answer queries. Essentially, Copilot serves as an AI-powered digital assistant, akin to Microsoft’s vision for Cortana. A few other tools have been enhanced with AI. Paint receives Photoshop-like features, supporting transparency, layers, and an upcoming AI image generator called “Paint Cocreator” for creating images from text prompts. Snipping Tool’s AI upgrades enable text extraction and redaction from images, akin to similar features on Android or iOS. Photos app now offers a background blur feature that automatically highlights the subject in photos.

But that’s not all, starting with November 1st Microsoft will release the long anticipated and equally waited for Microsoft 365 Copilot, I should be able to set my hand on this revolutionary piece of technology a few days before the general availability. As previously announced the extension to E3 and E5 subscription will cost 30$ per user and the minimum number of seats will be 300. Just to give you a very brief overview of the native features:

  • Copilot in Outlook: Summarize email threads, match your writing style, and follow Teams meetings.
  • Copilot in Word: Request document summaries, rewrite paragraphs, and format text.
  • Copilot in Excel: Analyze and visualize data, quickly add formula columns, and access advanced analytics.
  • Copilot in Loop: Co-create prompts, organize team projects, and generate recaps for teammates.
  • Copilot in OneNote: Generate summaries, expand on your notes, and edit for clarity.
  • Copilot in Stream: Obtain video summaries, ask open-ended questions, and identify discussion topics.
  • Copilot in OneDrive: Ask questions related to a file or get a summary of the content without opening the file.

Also Copilot can be extended to interact with your own custom data and the Copilot Lab will help users work iteratively with AI, improving prompts, sharing with coworkers, and getting inspired to work in new ways with AI.

Now I really believe that in a 3 year timeframe no new business application will be released without a copilot like interface. After all what’s more natural than language to interact with a digital solution? I also believe that UX/UI design will be tremendously impacted by LLM and generative AI.

Amazon has been left at the margins by Google, META and Microsoft, to try to fill the gap it invested 4 billion $ in Anthrophic. This investment marks a notable step towards fostering advancements in the domain of artificial intelligence, particularly focusing on enhancing the reliability, interpretability, and human-centric orientation of large-scale AI models. Anthropic, recognized for its commitment to AI safety, is spearheaded by a team of experienced individuals from diverse fields such as machine learning, physics, policy, and product development​. Anthropic was established by former members of OpenAI, emphasizing the development of general AI systems and large language models. Being a public-benefit corporation, Anthropic aligns with the effective altruism movement, showcasing a broader vision of harnessing AI for the greater good.

DALL-E 3 announced by OpenAI, to be released in October, can’t wait to try it. More in the next month post.

Curious on how running Llama 2 on Azure would cost? Obviously it depends on the model you use, but let’s say you can start around 7$ per hour and land in the high 30s per hour. The official introduction of Llama 2 on Azure noted that fine-tuning, evaluating, and deploying Llama 2 models requires GPU compute of V100 / A100 SKUs.

YouTube joined the AI assisted content creation with several initiatives. This is important to understand how much pervasive will be AI in content creation. Did I say will? I better say IS. YouTube has introduced AI-powered tools to augment the creative process for its users. Among these tools is a feature known as “Dream Screen,” which allows creators to generate AI images and use them as video backdrops. By inputting prompts, creators can generate backgrounds for their videos, making the platform more appealing and interactive for content creation​​. The “Dream Screen” feature specifically aims at enhancing YouTube Shorts by providing AI-generated videos and photos as backgrounds, thereby potentially replacing many 3rd party editing apps. This feature not only allows for the generation of AI imagery based on prompts but also envisions a future where content creators can remix and edit their existing content with the aid of AI, fostering a new wave of creativity on the platform. YouTube is also focusing on breaking language barriers by introducing an AI dubbing feature. This feature has been developed with the expertise of the Aloud team from Google’s Area 120 incubator and enables effortless dubbing of videos into various languages, broadening the global audience reach for creators. In addition, YouTube plans to roll out an AI feature in YouTube Studio that will recommend video topics and structures based on current trends, assisting creators in staying in tune with audience interests. Moreover, a new mobile app named “YouTube Create” has been announced to simplify the Shorts process, providing essential video editing tools to both newcomers and experienced creators alike​. A music recommendation system tailored to video descriptions is another AI-powered tool in the pipeline, aimed at simplifying the process of enhancing video soundtracks by suggesting suitable audio tracks based on a creator’s video description​.

And really Google is not sleeping at all. The September 2023 update for Google Bard introduced several notable features. The major update is Bard Extensions, enabling Bard to interact with various Google apps and services like Gmail, Docs, Drive, Google Maps, YouTube, and others, making it easier to retrieve information across these platforms. A new “Google it” button helps verify Bard’s responses with web-sourced information. Additionally, shared conversations with Bard can be expanded upon by others through public links. This update, powered by enhancements to the underlying PaLM 2 model, also extends some existing features to over 40 languages​​. Unfortunately, I cannot yet use many of these new features from Italy.

Source: Google.

In the meantime Azure teams are trying to ride the AI wave the best they can.

Azure Machine Learning improved support for larger data sizes and models. Secondly, a default Responsible AI dashboard for the best AutoML model is generated. Additionally, distributed training for tabular datasets in AutoML is now supported, enhancing the handling of larger datasets during the training process​.

Source: Microsoft Azure

Intel seemingly lost the AI battle in favour of Nvidia. But in September Intel announced a new chip due in December that promises to be able to run generative LLM chat models on your own laptop. Honestly I don’t know how this is gogin to work, how the chip can be upgraded and which model complecity it will be able to deal with. Nevertheless the news intrigued me and I wanted to take note.

Source: Reuters.

  1. Leave a comment

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.