Abhishek Singh's Musings: Large Language Models to Large Action Models - Step towards Artificial General Intelligence

It’s almost end of January, and I am still getting Happy New Year messages on WhatsApp and I still feel that I haven’t yet replied to many of the greetings I have received from friends and well-wishers in last 3 weeks. While I was trying to reply to some of those messages, I got a notification about a LinkedIn post by a friend who wrote about Large Action Models (LAMs). The post made me read more about LAMs as it was something new compared to Large Language Models (LLMs), that most of us know about and are also using regularly – ChatGPT and Bard and other models which can help generate text responses based on the prompts given. As against this, Large Action Models go a step further – they enable actions – and complete the task that is assigned or asked for. An example can be booking a flight ticket or even booking a complete vacation which may include flight tickets, hotel bookings, meals etc.

Large Action Models (LAMs), an evolution of Large Language Models (LLMs), represent a significant development in artificial intelligence (AI). Unlike LLMs, which generate text based on predictions, LAMs act as autonomous 'agents' capable of performing tasks and decision-making. These models, tailored for specific applications and human actions, employ neuro-symbolic programming to replicate a variety of tasks seamlessly, eliminating the need for initial demonstrations. LAMs interact with the real world through integration with external systems, such as IoT devices, enabling them to perform physical actions, control devices, retrieve data, and manipulate information. Their capabilities include understanding complex human goals expressed in natural language, adapting to changing circumstances, and collaborating with other LAMs. Notable use cases span healthcare, finance, and automotive sectors, where LAMs can enhance diagnostics, risk measurement, and produce self-governing vehicles.

LAMs became a buzz word when a company called Rabbit introduced their LAM device R1 at the recently held CES 2024. They launched Rabbit R1 – priced at $ 199 – as a small palm size device that could do simple tasks – better than a smartphone by enhancing one’s digital experience. The Rabbit r1 device, powered by Rabbit OS and a LAM acts as an AI assistant, capturing photos, videos, and interacting with users naturally. Rabbit R1 can edit your spreadsheets, save them and mail them to people you ask it to. It can also manage your social media by creating posts that you want and sharing them across social media platforms. LAMs are poised to play a pivotal role in the future of AI, transforming language models into real-time action companions. Real-world applications like Rabbit demonstrate the potential of LAMs in revolutionizing user interaction and shaping the landscape of AI.

Large Action Models (LAMs) differ from Large Language Models (LLMs) in their capabilities and functionalities:

Task Execution

LLMs are primarily focused on generating human-like text based on input data. They excel in natural language understanding and text generation but do not inherently perform tasks or actions. LAMs, on the other hand, are designed to go beyond text generation. They act as autonomous agents capable of executing tasks, making decisions, and interacting with the real world.

Autonomy

LLMs generate text responses based on patterns learned during training but lack the autonomy to perform actions or make decisions beyond text generation. LAMs have the ability to act autonomously. They can connect to external systems, control devices, retrieve data, and manipulate information, allowing them to perform complex tasks without human intervention.

Integration with External Systems

LLMs typically operate within a closed system and do not have direct integration with external systems or devices. LAMs interact with the real world by integrating with external systems, such as IoT devices. This enables them to perform physical actions and engage with the environment in a way that goes beyond text-based interactions.

Goal-Oriented Interaction

LLMs are primarily focused on generating coherent and contextually relevant text based on input prompts but lack a goal-oriented approach to tasks. LAMs are designed to understand complex human goals expressed in natural language and translate them into actionable steps. They can respond in real-time and adapt to changing circumstances.

Applications

LLMs are commonly used for natural language understanding, text generation, and various language-related tasks. As against this, LAMs are applied in diverse domains such as healthcare, finance, automotive, and more, where their ability to perform tasks has practical applications. For example, LAMs can aid in diagnostics, risk measurement, and even operate self-driving vehicles.

In essence, while LLMs excel in language-related tasks and text generation, LAMs extend these capabilities by combining language fluency with the capacity to autonomously execute tasks and make decisions, representing a significant advancement in the field of artificial intelligence.

It’s interesting to look at the potential use cases of LAMs. Some examples include:

1. Healthcare:

Diagnostics: LAMs can analyse medical data, including imaging scans, to assist in diagnosing diseases.
Treatment Strategy: LAMs can recommend personalized treatment plans based on patient data and medical knowledge.

2. Finance:

Risk Measurement: LAMs can assess and analyze financial risks, providing insights for investment decisions.
Fraud Detection: LAMs can identify patterns indicative of fraudulent activities in financial transactions.

3. Automotive:

Self-Driving Vehicles: LAMs can control and navigate autonomous vehicles, making real-time decisions based on environmental data.
Vehicle Safety Systems: LAMs can enhance safety features by processing data from sensors and taking preventive actions.

4. Education:

Personalized Learning: LAMs can tailor educational content and strategies based on individual student performance and needs.
Language Translation: LAMs can assist in translating educational materials into various languages.

5. Customer Service:

Automated Support: LAMs can provide automated customer support by understanding and responding to user queries.
Issue Resolution: LAMs can troubleshoot problems and guide users through issue resolution processes.

6. Home Automation:

Smart Home Control: LAMs can control smart home devices, adjusting temperature, lighting, and security systems based on user preferences.
Virtual Assistants: LAMs can act as intelligent virtual assistants, performing tasks such as setting reminders, sending messages, and managing schedules.

7. Manufacturing:

Quality Control: LAMs can analyze visual data from manufacturing processes to identify defects and ensure product quality.
Supply Chain Optimization: LAMs can optimize supply chain processes by analyzing data and making recommendations for efficiency.

8. Research and Development:

Data Analysis: LAMs can process large datasets, extract meaningful insights, and assist in research endeavors.
Innovation Support: LAMs can generate ideas and suggestions for innovation based on input criteria.

9. Entertainment:

Content Creation: LAMs can assist in generating creative content, including writing, art, and music.
Interactive Storytelling: LAMs can create dynamic and interactive storytelling experiences.

These examples showcase the versatility of LAMs in performing tasks that range from complex decision-making in healthcare to enhancing daily activities in smart homes. The ability to understand natural language and autonomously execute actions makes LAMs valuable across numerous applications and industries.

So finally it seems I can train a LAM to learn how to reply to Happy New Year greetings on WhatsApp or delete messages from a few nagging individuals on a group chat!!

3 comments:

Anurag Srivastava said...: Sir great article, didn’t know about LAM. Thinking of another use case, LAM can screen social media, like/unlike, post comments and build social media profile for individual, companies; January 24, 2024 at 10:26 PM
Anonymous said...: Very useful article to give a digital start to the day; January 28, 2024 at 7:35 AM
Anonymous said...: Sir , the status of the final results for Scientist B for NIC , Advertisement No. NIELIT/NIC/2023/1. The Notification came in Feb 2023 and written exam was held on December 12, 2023, and the interview on October 16, 2024 for Scientist B . More than 6 months has passed from Interview date, the results remain pending, causing uncertainty for candidates.

Help us sir, we are waiting for result but no Response from the Organization.; April 29, 2025 at 5:32 PM

Tuesday, January 23, 2024

Large Language Models to Large Action Models - Step towards Artificial General Intelligence

3 comments: