Monday, May 4, 2026

DAIMON Robotics Needs to Give Robotic Arms a Sense of Contact


This text is delivered to you by DAIMON Robotics.

This April, Hong Kong-based DAIMON Robotics has launched Daimon-Infinity, which it describes as the biggest omni-modal robotic dataset for bodily AI, that includes excessive decision tactile sensing and spanning a variety of duties from folding laundry at residence to manufacturing on manufacturing facility meeting strains. The undertaking is supported by collaborative efforts of companions throughout China and the globe, together with Google DeepMind, Northwestern College, and the Nationwide College of Singapore.

The transfer alerts a key strategic initiative for DAIMON, a two-and-a-half-year-old firm recognized for its superior tactile sensor {hardware}, most notably a monochromatic, vision-based tactile sensor that packs over 110,000 efficient sensing models right into a fingertip-sized module. Drawing on its high-resolution tactile sensing know-how and a distributed out-of-lab assortment community able to producing tens of millions of hours of information yearly, DAIMON is constructing large-scale robotic manipulation datasets that embody huge quantities of tactile sensing information. To speed up the real-world deployment of embodied AI, the corporate has additionally open-sourced 10,000 hours of its information.

Prof. Michael Yu Wang, co-founder and chief scientist at DAIMON Robotics, has pioneered Imaginative and prescient-Tactile-Language-Motion (VTLA) structure, elevating the tactile to a modality on par with imaginative and prescient.DAIMON Robotics

Behind the technique is Prof. Michael Yu Wang, DAIMON’s co-founder and chief scientist. Prof. Wang earned his PhD at Carnegie Mellon — learning manipulation below Matt Mason — and went on to discovered the Robotics Institute on the Hong Kong College of Science and Know-how. An IEEE Fellow and former Editor-in-Chief of IEEE Transactions on Automation Science and Engineering, he has spent roughly 4 many years within the subject. His goal is to handle the lacking “insensitivity” of robotic manipulation, which virtually depends on the dominant Imaginative and prescient-Language-Motion (VLA) mannequin. He and his crew have pioneered Imaginative and prescient-Tactile-Language-Motion (VTLA) structure, elevating the tactile to a modality on par with imaginative and prescient.

We spoke with Prof. Wang about how tactile suggestions goals to alter dexterous manipulation, how the dataset initiative is foreseen to enhance our understanding of robotic arms in pure environments, and the place — from resorts to comfort shops in China — he sees touch-enabled robots making their first real-world inroads.

Daimon-Infinity is the world’s largest omni-modal dataset for Bodily AI, that includes million-hour scale multimodal information, ultra-high-res tactile suggestions, information from 80+ actual situations and a pair of,000+ human expertise, and extra.DAIMON Robotics

The Dataset Initiative

This month, DAIMON Robotics launchd the largest and most complete robotic manipulation dataset with a number of main tutorial establishments and enterprises. Why releasing the dataset now, quite than persevering with to deal with product improvement? What influence will this have on the embodied intelligence {industry}?

DAIMON Robotics has been round for nearly two and a half years. We’ve got been dedicated to growing high-resolution, multimodal tactile sensing gadgets to understand the interplay between a robotic’s hand (notably its fingertips) and objects. Our gadgets have develop into fairly strong. They’re now accepted and utilized by a big phase of customers, together with tutorial and analysis institutes in addition to main humanoid robotics firms.

As embodied AI continues to advance, the important function of information has been clearer. Information shortage stays a major bottleneck in robotic studying, notably the dearth of bodily interplay information, which is important for robots to function successfully in the true world. Consequently, information high quality, reliability, and price have develop into main issues in each analysis and industrial improvement.

That is precisely the place DAIMON excels. Our vision-based tactile know-how captures high-quality, multimodal tactile information. Past fundamental contact forces, it information deformation, slip and friction, materials properties and floor textures — enabling a complete reconstruction of bodily interactions. Constructing on our experience in multimodal fusion, we now have developed a sturdy information processing pipeline that seamlessly integrates tactile suggestions with imaginative and prescient, movement trajectories, and pure language, reworking uncooked inputs into training-ready dataset for machine studying fashions.

Recognizing the industry-wide information hole, we view large-scale information assortment not solely as our distinctive aggressive benefit, however as a duty to the broader group.

By constructing and open-sourcing the dataset, we goal to offer the high-quality “gas” wanted to energy embodied AI, finally accelerating the real-world deployment of general-purpose robotic basis fashions.

The robotics {industry} is very aggressive, and plenty of groups have chosen to deal with information. DAIMON is releasing a big and extremely complete cross-embodiment, vision-based tactile multimodal robotic manipulation dataset. How have been you in a position to obtain this?

We’ve got a devoted in-house crew targeted on increasing our capabilities, together with constructing {hardware} gadgets and growing our personal large-scale mannequin. Though we’re a comparatively small firm, our core tactile sensing know-how and revolutionary information assortment paradigm allow us to construct large-scale dataset.

Our strategy is to broaden our providing. We’ve got constructed the world’s largest distributed out-of-lab information assortment community. Moderately than counting on centralized information factories, this light-weight and scalable system permits information to be gathered throughout numerous real-world environments, enabling us to generate tens of millions of hours of information per yr.

“To drive the development of all the embodied AI subject, we now have open-sourced 10,000 hours of the dataset for the broader group.” —Prof. Michael Yu Wang, DAIMON Robotics

This dataset is being collectively developed with a number of establishments worldwide. What roles did they play in its improvement, and the way will the dataset profit their analysis and merchandise?

Moreover China based mostly groups, our companions embody main analysis teams from universities, resembling Northwestern College and the Nationwide College of Singapore, in addition to high international enterprises like Google DeepMind and China Cell. Their choice to companion with DAIMON is a powerful testomony to the worth of our tactile-rich dataset.

Among the many firms concerned there are some which have already constructed their very own fashions however are actually incorporating tactile info. By deploying our information assortment gadgets throughout analysis, manufacturing and different real-world situations, they assist us to assemble extremely sensible, application-driven information. In flip, our companions leverage the information to coach fashions tailor-made to their particular use circumstances. Moreover, to drive the development of all the embodied AI subject, we now have open-sourced 10,000 hours of the dataset for the broader group.

Robotic gripper delicately holding a cracked eggshell in a dimly lit roomGeared up with Daimon’s visuotactile sensor, the gripper delicately senses contact and exactly controls pressure to select up a fragile eggshell.Daimon Robotics

From VLA to VTLA: Why Tactile Sensing Modifications the Equation

The mainstream paradigm in robotics is presently the Imaginative and prescient-Language-Motion (VLA) mannequin, however your crew has proposed a Imaginative and prescient-Tactile-Language-Motion (VTLA) mannequin. Why is it mandatory to include tactile sensing? What does it allow robots to realize, and which duties are more likely to fail with out tactile suggestions?

Over these years of working to make generalist robots able to performing manipulation duties, particularly dexterous manipulation — not simply energy greedy or holding an object, however manipulating objects and utilizing instruments to impart forces and movement onto elements — we see these robots being utilized in family in addition to industrial meeting settings.

It’s nicely established that tactile info is important for offering suggestions about contact states in order that robots can information their arms and fingers to carry out dependable manipulation. With out tactile sensing, robots are severely restricted. They battle to find objects in darkish environments, and with out slip detection, they’ll simply drop fragile gadgets like glass. Moreover, the lack to exactly management pressure typically results in failed manipulation duties or, in extreme circumstances, bodily injury. Naturally, the VLA strategy must be enhanced to include tactile info. We expanded the VLA framework to include tactile information, creating the VTLA mannequin.

An extra good thing about our tactile sensor is that it’s vision-based: We seize visible photographs of the deformation on the fingertip floor. We seize a number of photographs in a time sequence that encodes contact info, from which we are able to infer forces and different contact states. This aligns nicely with the visible framework that VLA is predicated upon. Having tactile info in a visible picture format makes it naturally appropriate for integration into the VLA framework, reworking it right into a VTLA system. That’s the key benefit: Imaginative and prescient-based tactile sensors present very excessive decision on the pixel stage, and this information will be included into the framework, whether or not it’s an end-to-end mannequin or one other kind of structure.

Close-up of a vision-based tactile sensor with 110,000 sensing units, resembling a smartwatch screen glowing with colorful digital static in the darkDAIMON has been recognized for its vision-based tactile sensors that may pack over 110,000 efficient sensing models.DAIMON Robotics

The Know-how: Monochromatic Imaginative and prescient-based Tactile Sensing

You and your crew have spent a few years deeply engaged in vision-based tactile sensing and have developed the world’s first monochromatic vision-based tactile sensing know-how. Why did you select this technical path?

As soon as we began investigating tactile sensors, we understood our wants. We wished sensors that intently mimic what we now have below our fingertip pores and skin. Physiological research have nicely documented the capabilities people have at their fingertips — understanding what we contact, what sort of materials it’s, how forces are distributed, and whether or not it’s transferring into the proper place as our mind controls our arms. We knew that replicating these capabilities on a robotic hand’s fingertips would assist significantly.

After we surveyed current applied sciences, we discovered many varieties, together with vision-based tactile sensors with tri-color optics and different less complicated designs. We determined to combine the very best of those into an engineering-robust answer that works nicely with out being overly difficult, preserving price, reliability, and sensitivity inside a passable vary, thus finally growing a monochromatic vision-based tactile sensing method. That is basically an engineering strategy quite than a purely scientific one, since quite a lot of foundational analysis already existed. With the rising realization of the need of tactile information, all of this may advance hand in hand.

Daimon tactile sensor showing force, geometry, material, and contact data visualizations.DAIMON vision-based tactile sensor captures high-quality, multimodal tactile information.DAIMON Robotics

Final yr, DAIMON launched a multi-dimensional, high-resolution, high-frequency vision-based tactile sensor. In contrast with conventional tactile sensors, the place does its core benefit lie? Which industries might it doubtlessly rework?

The important thing options of our sensors are the density of distributed pressure measurement and the deformation we are able to seize over the world of a fingertip. I consider we now have the best density by way of sensing models. That’s one crucial metric. The opposite is dynamics: the frequency and bandwidth — how rapidly we are able to detect pressure adjustments, transmit alerts, and course of them in actual time. Different essential features are largely engineering-related, resembling reliability, drift, sturdiness of the delicate floor, and resistance to interference from magnetic, optical, or environmental components.

A rising variety of researchers and corporations are recognizing the significance of tactile sensing and adopting our know-how. I consider the advances in tactile sensing will elevate all the group and {industry} to the next stage. Certainly one of our potential clients is deploying humanoid robots in a small comfort retailer, with densely packed cabinets the place shelf area is at a premium. The robotic wants to achieve into very tight areas — tighter than books on a shelf — to select an object. Present two-jaw parallel grippers can’t match into most of those areas. Observing how people decide up objects, you clearly want not less than three slim fingers to the touch and roll the item towards you and safe it. Thus, we’re beginning to see very particular wants the place tactile sensing capabilities are important.

From Academia to Startup

After 40 years in academia — founding the HKUST Robotics Institute, incomes prestigious honors together with IEEE Fellow, and serving as Editor-in-Chief of IEEE TASE — what motivated you to discovered DAIMON Robotics?

I’ve come a great distance. I began studying robotics throughout my PhD at Carnegie Mellon, the place there have been actually outstanding teams engaged on locomotion below Marc Raibert, who based Boston Dynamics, and on manipulation below my advisor, Matt Mason, a frontrunner within the subject. We’ve got been engaged on dexterous manipulation, not solely at Carnegie Mellon, however globally for a few years.

Nevertheless, progress has been restricted for a very long time, particularly in constructing dexterous arms and making them work. Solely not too long ago have locomotion robots actually taken off, and solely in the previous few years have we begun to see main developments in robotic arms. There may be clearly room for advancing manipulation capabilities, which might allow robots to do work like people. Whereas at Hong Kong College of Science and Know-how, I noticed more and more higher individuals coming into this space within the type of college students and postdoctoral researchers. We wished to jumpstart our effort by leveraging the accessible capital and expertise sources.

Fortuitously, considered one of my postdocs, Dr. Duan Jianghua, has a powerful sense for industrial alternatives. Recognizing the speedy development of robotics market and the distinctive worth that our vision-based tactile sensing know-how might convey, collectively we began DAIMON Robotics, and it has progressed nicely. The group has grown tremendously in China, Japan, Korea, the U.S., and Europe.

Humanoid robots assembling electronics on an automated factory production lineRobots geared up with DAIMON know-how have been deployed in manufacturing facility settings. The corporate goals to allow robots to realize “embodied intelligence” and shut the hole between what they’ll see and what they’ll really feel.DAIMON Robotics

Enterprise Mannequin and Industrial Technique

What’s DAIMON’s present enterprise mannequin and strategic focus? What function does the dataset launch play in your industrial technique?

We began as a tool firm targeted on making extremely succesful tactile sensors, particularly for robotic arms. However as know-how and enterprise developed, everybody realized it’s not nearly one element, quite all the know-how chain: gadgets, information of enough high quality and amount, and eventually the proper framework to construct, practice, and deploy fashions on robots in actual utility environments.

Our enterprise technique is finest described as “3D”: Units, Information, and Deployment. We construct gadgets for information assortment, our personal ecosystem, and for deploying them in our companions’ potential utility domains. This permits the gathering of real-world tactile-rich information and full closed-loop validation. It will develop into an integral a part of the 3D enterprise mannequin. Most startups on this area are following an identical path till finally some could develop into extra specialised or extra tightly built-in with different firms. For now, it’s principally vertical integration.

Embodied Abilities and the Convergence Second

You’ve launched the idea of “embodied expertise” as important for humanoid robots to maneuver past having simply a sophisticated AI “mind.” What prompted this perception? What new capabilities might embodied expertise allow? After the speedy evolution of fashions and {hardware} over the previous two years, has your definition or roadmap for embodied expertise advanced?

We’ve got come a great distance now see a convergence level the place electrical, digital, and mechatronic {hardware} applied sciences have superior tremendously in final 20 years. Robots are actually absolutely electrical, don’t require hydraulics, as a result of {hardware} has advanced quickly. Trendy electronics present large bandwidth with excessive torques. If we are able to construct intelligence into these methods, we are able to create actually humanoid robots with the power to function in unstructured environments, make choices, and take actions autonomously.

“Our imaginative and prescient is for robots to realize strong manipulation capabilities and evolve into dependable companions for people.” —Prof. Michael Yu Wang, DAIMON Robotics

AI has arrived at precisely the proper time. Monumental sources have been invested in AI improvement, particularly massive language fashions, which are actually being generalized into world fashions that allow bodily AI capabilities. We wish to see these manifested in real-world methods.

Whereas each AI and core {hardware} applied sciences proceed to evolve, the main target is far clearer now. For instance, human-sized robots are most well-liked in a house setting. That is an thrilling area with a promise of nice societal profit if we are able to finally obtain protected, dependable, and cost-effective robots.

The Street to Actual-World Deployment

In the present day, many robots can ship spectacular demos, but there stays a spot earlier than they honestly enter real-world purposes. What could possibly be a possible set off for real-world deployment? Which situations are almost certainly to realize large-scale deployment first?

I believe the highway towards large-scale deployment of generalist robots continues to be lengthy, however we’re beginning to see indicators of feasibility inside particular domains. It is rather just like autonomous autos, the place we’re but to see full deployment of robo-taxis, whereas we now have already began to search out cellular robots and smaller autos extensively deployed within the hospitality {industry}. Just about each main lodge in China now has a supply robotic — no arms, only a automobile that picks up gadgets from the lodge foyer (e.g., meals deliveries). The supply individual simply hundreds the meals and selects the room quantity. It’s as much as the robotic thereafter to navigate and attain the visitor’s room, which incorporates utilizing the elevator, to ship the meals. That is already practically one hundred pc deployed in main Chinese language resorts.

Resort and restaurant robots are considered as a mannequin for deploying humanoid robots in particular domains like in a single day drugstores and comfort shops. I anticipate full deployment in such settings inside a brief timeframe, adopted by different purposes. General, we are able to anticipate autonomous robots, together with humanoids, to progressively penetrate particular sectors, delivering worth in every and increasing into others.

In the end, our imaginative and prescient is for robots to realize strong manipulation capabilities and evolve into dependable companions for people. By seamlessly integrating into our houses and each day lives, they may genuinely profit and serve humanity.

This interview has been edited for size and readability.

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles