AI horizons: new frontiers and thoughtful considerations – May 2024


Here’s the translated text while maintaining accuracy and factual correctness:


May has been even more intense than the preceding months of this year, with several key events overlapping. Here are four notable events:

  1. The release of GPT-4o.
  2. Google’s developer conference.
  3. Microsoft’s Build conference, the key event for developers using Microsoft technologies.
  4. The official release of the EU AI Act on May 21.

Let’s start from the beginning, knowing that for some of these topics, especially all the announcements from Microsoft and a deeper dive into the AI Act, further analysis will be necessary.

Top of the Month

From my perspective, the two most important events this month are undoubtedly the release of what is currently the most advanced multimodal generative AI model, GPT-4o, and the finalization of a comprehensive AI legislation by the European Union, which, like the GDPR, will likely become a global benchmark.

From my observation, GPT-4O introduces two revolutionary functionalities. The first is the Voice mode, where it genuinely feels like conversing with a human regarding response times. This Voice mode is linked to a series of other functionalities, primarily real-time translation of multiple languages. The second standout feature is its ability to analyze structured data and introduce dynamically modifiable charts within the chat. This model is twice as fast as GPT-4 Turbo and costs half as much. Since the release of GPT-4, the costs for this service have dropped to one-sixth, and the speed is likely between five to ten times faster. Another aspect of GPT-4o is its native vision capabilities, with enhanced comprehension in various scenarios: environments, people, and understanding on-screen content with the ability to guide or suggest solutions to problems.

Here are some examples of what can be done with the new live voice:

The second point is the official release of the EU AI Act on May 21. I have already written a primer on the AI Act, but it is useful to provide some key information here, as the AI Act will soon start having its first real effects, although it will fully come into force in 2 years. Firstly, to support its directive and control function, four new bodies have been established:

  1. An AI Office within the Commission to enforce common rules across the EU.
  2. A scientific panel of independent experts to support enforcement activities.
  3. An AI Board with representatives from member states to advise and assist the Commission and member states on the consistent and effective application of the AI Act.
  4. A stakeholder advisory forum to provide technical expertise to the AI Board and the Commission.

Additionally, here is the expected timeline:

  • June-July 2024 – The AI Act will be published in the official gazette of the European Union.
  • 20 days later – The AI Act will officially come into force. From this date, in compliance with Article 113, the following milestones will follow:
  • At 6 months – Chapter I and Chapter II (ban on the use of AI in unacceptable risk scenarios) will come into force.
  • At 12 months – Chapter III Section 4 (notification authorities), Chapter V (general-purpose AI models), Chapter VII (governance), Chapter XII (penalties and confidentiality), and Article 78 (confidentiality) will come into force, with the exception of Article 101 (penalties for GPAI providers).
  • At 24 months – The AI Act will be fully in force with the exception.
  • At 36 months – Article 6 (classification rules for high-risk systems).
  • At 9 months – Codes of conduct Article 56 must be prepared.

Microsoft

The announcements for the AI giant, which remains the world’s most capitalized company, were numerous and comprehensive. I will delve more deeply into some implications of the announcements made at Build in a dedicated article. Meanwhile, here are the most impactful news items.

As previously mentioned, two AI challenges that I believe will have a more direct impact on daily life are the advent of personal assistants and humanoid robots. The announcement of “Copilot plus PC” must be viewed in the context of personal assistants. It involves completely rethinking human/machine interaction in favor of written or spoken natural language with an integrated assistant that helps perform daily tasks or get us out of trouble. It is the next step in personal computing as we know it. Of particular interest is the fact that Microsoft is providing an API for developers (Windows Copilot Runtime), so anyone developing applications on Windows can access the models underlying Copilot.

Despite its partnership with OpenAI, Microsoft’s new consumer AI group (led by the former CEO of the AI startup Inflection) is creating its own internal large language model (LLM), called MAI-1. MAI-1 will use about 500B parameters, directly competing with industry-leading LLMs like OpenAI’s GPT-4, Anthropic’s Claude, and Google’s Gemini. It is expected to utilize Inflection’s technology and will have capabilities similar to OpenAI’s GPT models and Microsoft’s new small AI model (Phi-3), designed for use on smartphones.

I am convinced that AI will radically change our way of living, studying, and interacting. Among Microsoft’s many initiatives is the release of a free “Reading Coach” designed to provide students with AI-generated stories that meet all safety criteria.

Speaking of LLM model safety, we can certainly consider Microsoft at the forefront of tools designed to add external safeguards to the models that can verify both prompts (in) and completions (out) to intercept and block various attack techniques on LLMs that I have previously discussed.

Lastly, there was an inevitable investment in the humanoid robotics area. Microsoft announced a collaboration with Sanctuary AI, known for its humanoid robot called Phoenix, to develop a general-purpose humanoid robot. Both will work on developing Large Behavior Models (LBMs) that power general-purpose robots, allowing them to learn from the real world, not from computer simulations. Microsoft will provide the Azure cloud infrastructure to power heavy AI workloads, and Sanctuary AI will bring its deep technical expertise and experience to the collaboration. Sanctuary AI robots have already been deployed in one of Canada’s largest retail chains and have been tested in 400 customer-related tasks in 15 different industries. After leading a $675M Series B funding round in the AI robotics startup, FigureAI, in February, this partnership further strengthens Microsoft’s commitment to AI development.

New Trends

The release of digital assistants to support code development continues. In May, Amazon released Q Developer, GitHub Copilot Workspace, and Mistral Codestral. These add to an already existing series of assistants (Devin, Google Duet, and Gemini in Android Studio and VS Code), making the choice and evaluation increasingly complex. For now, we remain with GitHub Copilot, but GitHub will need to continue adding value if it does not want to lose ground to open-source models.

GitHub Copilot Workspace, in technical preview, is a platform that facilitates development starting from a natural language description of what you want to achieve, suggesting a plan of action and applying the changes. Every step of the Workspace can be customized, recreated, or undone, allowing you to get the desired solution. It offers an integrated terminal and secure port forwarding for code verification and the ability to launch a Codespace to use GitHub’s native tools. It allows for immediate sharing of a workspace with the team for feedback and iterations, automatically tracking context and change history, and creating a PR with one click.

Amazon Q is a digital assistant for code development that provides everything developers need, including the ability to deploy the created solution on AWS. It works with popular IDEs like Visual Studio Code, JetBrains IntelliJ IDEA, and others. Amazon Q Developer can automate the renewal and updating of code (e.g., from Java 8 to Java 17), enhance security with vulnerability analysis, and propose code improvements.

Naturally, it can generate documentation, refactor code, and add new features based on developer descriptions. It also supports AWS support case management and integration with chatbots on Slack and Microsoft Teams.

Codestral is an AI coding assistant that has learned over 80 programming languages, including Python, Java, C++, and JavaScript. It can write code, test functions, and answer questions about the codebase. Codestral can be used through Mistral’s ‘Le Chat’ chatbot, an API for third-party search or development apps, or HuggingFace, with a license for research and testing. Mistral claims that Codestral outperforms existing AI coding assistants and collaborates with industry partners like JetBrains, SourceGraph, and LlamaIndex.

However, these assistants are far from error-free or capable of writing software without vulnerabilities; human experience and the use of other tools for code verification are even more indispensable today.

Market News

Finally, Anthropic Claude is available in Europe. Previously, it was only accessible through AWS. This is excellent news because it is the only model today capable of competing with GPT-4.

Speaking of models, Google has showcased Project Astra, aiming to achieve what GPT-4o’s live vision can do today and perhaps include it in some wearables like glasses or similar devices.

During the first day of its I/O developer conference, Google announced a radical overhaul of its search functionalities by integrating advanced AI-powered capabilities using its LLM Gemini. This move aims to revolutionize how users interact with search results but has also raised concerns. Google’s new AI Overview feature, designed to provide summaries at the top of search results, has faced widespread criticism. Users have reported instances where AI-generated content was inaccurate, misleading, and even dangerous. The tool erroneously cited satirical articles and Reddit jokes, leading to factual errors that sparked debates about the reliability of Google’s AI systems. For example, the AI Overview inaccurately stated that Barack Obama was a Muslim president

, recommended adding glue to pizza sauce, and advised staring at the sun for health benefits.

Google quickly addressed these issues, but the damage was already done. The rush was not a good advisor, and Google’s AI race (and to think that until a couple of years ago, Google was considered the leader in this field) led to too many errors. Google’s ability to maintain user trust is crucial to preventing a shift to alternative search engines. Erosion of trust in Google Search could negatively impact advertiser performance, website organic traffic, and company revenue streams. To mitigate concerns about AI-powered search engines summarizing information without providing the appropriate context, Google stated that AI overviews will not be provided for every search. Instead, they will appear when queries are complex. Additionally, previous tests indicate that users still prefer visiting websites for a human perspective.

It is no coincidence that increasingly persistent rumors suggest that OpenAI is considering its own search engine. We shall see.

Also from Google’s developer conference:

  • Ask Photos: Can explore your Google photos and answer questions like “What is my license plate number?”
  • AI personal assistants: Can check emails, fill out forms, and organize appointments to facilitate complex tasks, like designing clothes.
  • Nano: Operates on mobile devices and uses AI to detect and block anomalous calls.
  • LearnLM: Is a family of language models trained with educational research for teachers and learners.
  • Gemini 1.5 Pro and Gemini 1.5 Flash: These are new versions of Gemini. Pro is designed to support the new features shown at I/O, and Gemini 1.5 Flash is the same as Pro but faster.

Legal & Compliance

Having discussed the EU AI Act, May was a month of significant movements for META and OpenAI, which reviewed their internal control bodies on the models developed.

Six months after dissolving the ‘Responsible AI’ team, Meta CEO Mark Zuckerberg has established a product advisory council to guide the company’s AI efforts. This council, composed of renowned executives like Patrick Collison, Nat Friedman, Tobi Lütke, and Charlie Songhurst, is tasked with providing insights and recommendations on technological advancements, innovation, and strategic growth opportunities. Notably, these advisors are not elected by shareholders, are unpaid, and operate independently of the board, without any legal liability. This initiative aligns with Zuckerberg’s ambitious plan to invest $35 billion in AI-focused products, positioning Meta to potentially lead the global AI industry, despite recognizing that immediate results may not be imminent.

Conversely, OpenAI has established a new safety and security committee in response to internal turmoil, particularly the loss of several employees focused on safety who were concerned about prioritizing ‘shiny products’ over safety. This committee, led by CEO Sam Altman and comprising three board members and other internal technical and policy experts, will review OpenAI’s safety processes over the next 90 days, presenting their findings and recommendations to the board for further action. Unlike Meta’s external product advisory council, OpenAI’s committee is entirely internal, raising doubts about its ability to objectively address the safety concerns that led to its formation.

The contrast between the two companies’ strategies is stark: Meta’s external product advisory council aims to promote innovation and strategic growth, guided by industry leaders, while OpenAI’s internal committee is a response to internal safety concerns, tasked with scrutinizing the company’s practices. Meta’s approach suggests a focus on external validation and future-oriented growth, while OpenAI’s method highlights a response to internal tensions. Both strategies reflect their respective corporate cultures and immediate challenges, showing different paths towards achieving AI industry leadership.

At the international AI safety summit in Seoul, 16 major AI companies, including Amazon, Google, Microsoft, Meta, and OpenAI, agreed on the “Frontier AI Safety Commitments”. These commitments involve the safe development and deployment of their AI models, publishing safety frameworks to measure risks, and setting thresholds to identify when risks become intolerable. The companies committed to taking responsibility and refraining from deploying AI models if they fail to keep risks below these thresholds. This unprecedented global agreement among major AI companies aims to ensure accountability and transparency in AI development.

Scientific

A team of researchers at Ohio State University has created CURE, an AI model that can accurately assess the effects and efficacy of pharmaceutical treatments without the need for clinical trials. The model is based on anonymized health records of over 3 million patients, enabling it to gain a deep understanding of patient characteristics. CURE outperformed seven other leading AI models in estimating treatment efficacy, with 7-8% improvements on key indicators. AI predictions closely matched clinical trial results in tests, showing potential for generating insights that simplify drug trials. With the ability to process large medical datasets, CURE represents a significant advancement towards systems that can reliably assess drug efficacy in the real world, potentially speeding up the discovery of new treatments without the prolonged costs and times of traditional clinical trials.

In the medical field, researchers at UC San Francisco have developed an innovative brain implant that leverages AI to help a stroke survivor communicate seamlessly in both Spanish and English by interpreting brain activity. This bilingual implant was tested on a patient who lost the ability to speak after suffering a stroke at age 20. The AI-powered decoding system was trained to recognize the patient’s brain activity patterns while articulating words in both languages. Remarkably, the system was able to determine the patient’s intended language with 88% accuracy and correctly identify the phrase 75% of the time. This groundbreaking implant allows the patient to participate in bilingual conversations and switch between languages, despite not having learned English until after the stroke. This research exemplifies the growing ability of AI to interpret brain waves, potentially unlocking a wide range of new insights, treatments, and technologies. It also marks a significant advancement in facilitating communication for stroke victims and overcoming language barriers in the process.

Interestingly, LLM models are so complex that they are not “transparent” or explainable, something legislators would like, highlighting a deep ignorance of the state of technology. Even more curious is that to understand how they work, similar techniques are now being applied to identifying specialized zones in the human brain. Both OpenAI and Anthropic researchers are trying to understand how shutting down certain parts of the models can change responses, for example, inhibiting certain types of answers or “reasoning”. Anthropic’s latest research, called “Scaling Monosemanticity,” identifies “bundles” that are turned on or off to understand model behavior. Anthropic has successfully identified and mapped millions of human-interpretable concepts, known as “features,” within Claude’s neural networks. Researchers used a technique called ‘dictionary learning’ to isolate patterns corresponding to a wide range of concepts, from tangible objects to abstract ideas. By modifying these patterns, the team demonstrated the ability to alter Claude’s outputs, paving the way for more controllable AI systems. They also mapped concepts relevant to AI safety concerns, such as deceptive behavior and power-seeking tendencies, providing insights into how these models understand and potentially manifest such behaviors.

Various

Everyone talks about prompting and prompt engineering, sometimes inappropriately, but AI can help us write more effective prompts. Now that Anthropic has landed in Europe, we can access (without having to resort to proxies and virtual US phone numbers) all the tools available, including a support tool for creating prompts:

  1. Go to Anthropic’s Console and sign up/log in.
  2. Select the ‘Generate a prompt’ option to access the prompt creation interface.
  3. Clearly describe your goal or task in the provided text box and click ‘Generate Prompt’ to allow Anthropic to create your optimized prompt.
  4. Select ‘Start Editing,’ replace {variables} with your specific details, and click ‘Run’ once your prompt is ready.

, , , , , , ,

  1. Leave a comment

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.