Building a robotics research setup that lives next to my desk

Posted by mplappert 7 days ago

Building a robotics research setup that lives next to my desk(dfdxlabs.com)

Quick framing, since the post is long: I did robotic manipulation research at OpenAI from 2017–2020, and the tabletop setup back then cost roughly 10x this one and took a team to run. This project is me testing whether a single person can now do meaningful work on the same class of problems: starting with physical and software setup.

A few decisions I'm least settled on, and would love some pushback/feedback on:

- single arm vs. bimanual (I went single for cost/space, knowing it rules out things like folding cloth)

- not calibrating camera extrinsics/intrinsics for now

- RGB vs. RGB-D for from-scratch policies (ACT / Diffusion Policy)

And one I'm more confident about but expect disagreement on: not building on ROS 2 / LeRobot, and writing my own stack instead. Happy to get into the reasoning.

177 points | 60 commentspage 2

thomasikzelf 6 days ago|

Nice, I will be following your posts! I just bought a robot arm myself, the seeed studio B601DM (€1500 6+1 axis), it works great and is open source hardware as well and a bit more solid then the so101. I also opted to not use ros, I don't want to give up control by putting another framework in between. Is your plan to see whats possible right now or do you also have ideas on how to improve sota?

mplappert 6 days ago|

Oh very cool! Looks a bit like the TRLC-DK1 (I was looking at this one for a bit).

I think pushing the sota is quite hard to do solo but we'll see. Mostly I want to get back up to speed after having not done much robotics during the last 6 years. Best way for me to learn is to just do it, so here we are. We'll see how far I get (I suspect at some point compute will be the main bottleneck)

thomasikzelf 6 days ago||

It looks like one stole the design from the other, I don't know which one, haha.

wxw 6 days ago||

Great article. I'll be following along. Would like to learn more about the robotics space.

- I've heard the advantage of ROS besides the architecture is the ecosystem (driver integrations, etc). Is that not an issue because the arm supports a Python SDK OOTB?

- Any issues you've been running into with this setup?

- How do you determine if a session recording is good enough for training? Is 50/100 samples really all you need?

mplappert 6 days ago|

Glad you like it!

Re your questions:

- The driver situation turned out totally fine; I intentionally picked HW with good python sdk support so that was very painless.

- The static camera (the C920) is not super great; it drops frames and sometimes cuts out. We’ll see how that goes but it’s probably the clostest thing I want to swap right now. Another issue is reach of the arm when forcing the worst to be axis parallel with the table; you cannot get very far away. The chess setup demo in the video gives an example: I can just reach the row of pawns and beyond that it’s out of reach.

- I don’t know yet! The 50-100 figure comes from the ACT and diffusion policy papers but it depends on the type of task. For fine tuning my sense is that you only need a few hours worth of demos to get good results with pi0.5 etc. a big reason I’m doing this project is that I want to try all of this myself, so the next posts definitely will talk about that

b89kim 5 days ago||

I could confirm 50-100 demonstrations are enough for fine-tuning pi0/pi05. I did research with aloha and humanoid. It works from 20~40ep(5~10min) but success rate would be 70~80%. Pi0 tech paper suggests to use over 1~4 hours of data. I could get 95% success rate for pick&place with 1 hour of humanoid. Anyway, required hours for good SR depend on generality of data. Long Horizon task over 5 min is not working as paper because PI removed high level(subtask) reasoning part in released pi05.

andned 6 days ago||

I had a very similar setup. Really happy with the xarm 6 lite. I played around with the diffusion policy paper experiments and was thinking to buy a webcam as a top camera as well but I ended up buying two intel realsense ones because of the timestamp drift issues. How did you solve that? Or is camera feed syncing not necessary for your intended projects?

mplappert 6 days ago|

I timestamp everything twice: once with the hardware clock (if available, like for the realsense camera) and once within my robot stack once it gets read from the device (using `time.monotonic_ns()`). Both are stored and alignment can happen with either timestamp. I think the 2nd timestamp is actually more meaningful since ultimately I want to reconstruct the state that the policy would've seen; so if one modality is delayed I should actually include that effect during training.

That being said, I might switch to a realsense for the static tabletop camera as well; the realsense wrist is clearly much more reliable than the cheap Logitech C920 that I currently use.

robotresearcher 5 days ago||

Both timestamps are useful in different ways. The early-as-possible hardware stamp is best for reasoning about causality, while the later-and-full-o-jitter middleware stamps are good for compensating for that inevitable jitter.

Time is one of the hard problems in robots, because they are inevitably but non-obviously distributed systems.

Robots are annoyingly, wonderfully difficult.

dlt713705 6 days ago||

As impressive as this setup may be, I'm still amazed at how slow this type of robot is, whether amateur or professional grade. I have no expertise in this field, but as an observer, the apparent progress in this area seems very limited. I guess my expectations are too high and my understanding of the problems to solve is too low.

mplappert 6 days ago|

It’s partially my fault I currently clip the max speed _and_ I only input soft control changes when teleoperating to avoid crashing into things. The robot itself could definitely move more quickly than what you see in the video.

It would be interesting to explore how RL can be applied on top of my (flawed) human demos to optimize beyond what I’m able to do.

forrestthewoods 5 days ago||

> And one I'm more confident about but expect disagreement on: not building on ROS 2 / LeRobot,

Tell me more! I am slightly biased in that direction. But can’t fully justify it at this point.

whiplash451 6 days ago||

How does Lerobot prevent « full control » and « understanding »? I thought this was an open source library.

I am not an official supporter of the library but am asking out of curiosity.

mplappert 6 days ago|

For understanding: I think the level is much deeper if I wrote the code vs reading someone else’s. Same applies to coding agents of course which is why I wrote most of it myself and only delegate some tasks (for example codex was great help at setting up telemetry dashboards or writing the custom glfw renderer).

On control: LeRobot will change all the time and I’ll be unaware of what changed. If something suddenly doesn’t work anymore, it’s a pain to find out. I can of course fork or pin but that defeats the purpose a bit.

At the end it’s also partially just preference: I wanted to write this layer myself, and I have opinions about how it should be architected, so I did.

bjt12345 5 days ago||

I hate whinging but why isn't stuff like this not moved higher on HN's front page? This is a great article yet I keep seeing world politics and other matters rated higher - stuff that (unlike this article) will age like milk.

modeless 5 days ago|

HN doesn't seem interested in robotics generally. You'll see it from time to time but the vast majority of great stuff never makes it to the front page. It's a shame, especially considering (as you point out) the constant political stuff (both overt and subtle) that does make it on here.

gessha 5 days ago||

It could be that polarizing threads end up more actively voted on compared to “cool stuff” threads. We’re not immune to social network effects after all.

modeless 5 days ago||

Yeah, that's an issue only admins can address. I would prefer much more aggressive enforcement of the rules about politics, and maybe some rule changes.

gessha 4 days ago||

I built a very crude app sourcing comments from the HN firehose, classifying them using Gemma 4 and a prompt made from the comment guidelines. It had some amusing results.

The app did a decent job at surfacing problematic comments that a mod can do something about.

It was cool to optimize llama-cpp arguments for throughput. During the slightly off-peak hours the post processing was pretty much real time. I suspect a second 3090 would’ve be enough for peak posting hours too.

modeless 4 days ago||

Yes I think there's huge potential in this direction. Forum moderation could be made far more scalable this way. If I didn't have another project I was already working on I'd be trying it right now.

timsuchanek 6 days ago||

This is really exciting. Incredible that you can do this for this budget at home. Unthinkable a couple years ago.

mplappert 6 days ago|

Thanks Tim and fun seeing you here :)

MrRobotics 6 days ago||

Fascinating article. Keep up the work, Matthias!

blt 6 days ago|

ROS sucks, good move. Too complicated

mplappert 6 days ago||

That was my take 8 years ago glad to hear it’s still that

laxpri 6 days ago||

whats the good alternative

More comments...