Posted by stevekrouse 5 days ago
But what really has my attention is: Why is this something I'm reading about on this smart engineer's blog rather than an Apple or Google product release? The fact that even this small set of features is beyond the abilities of either of those two companies to ship -- even with caveats like "Must also use our walled garden ecosystem for email, calendars, phones, etc" -- is an embarrassment, only obscured by the two companies' shared lack of ambition to apply "AI" technology to the 'solved problem' areas that amount to various kinds of summarization and question-answering.
If ever there was a chance to threaten either half of this lumbering, anticompetitive duopoly, certainly it's related to AI.
• Everyone in household uses an iPhone
• Main adult family members use iCloud Mail or at least use Apple Mail to read other mail
• Family members use iCloud contacts and calendars
• USPS Informed Delivery could be used (available to most/all US addresses)
• It can be ascertained what ZIP code you're in, for weather.
I think that's the full list of 'requirements' this thing would require. Just what's standing in their way?
I'm not sure it's so tricky for Apple, and for sure not for Google.
The cheapest small LLMs (GPT-4.1 Nano, Google Gemini 1.5 Flash 8B) cost less than 1/100th of a cent per prompt because they are cheap to run.
They could easily offer an on-prem family 'AI' product that you plop in your house and plug into your router, and does all AI processing for the whole family, and uses a secure VPN to connect to any of your devices outside the LAN.
If such a product delivered JUST what this guy's cool hack provides, and made Siri not a stupid piece of sh*t for my family, I'd buy it for $1999 even if I knew it cost Apple $700 to make.
Find something useful for one family, see if more families find it useful as well. If so, scale to platform level.
> One occasionally reads newspaper accounts of how two programmers in a remodeled garage have built an important program that surpasses the best efforts of large teams. And every programmer is prepared to believe such tales, for he knows that he could build any program much faster than the 1000 statements/year reported for industrial teams.
> Why then have not all industrial programming teams been replaced by dedicated garage duos? One must look at what is being produced.
One reason might be that personal data going into a database handled by a highly experimental software might be a non-issue for this dev, but it is a serious risk for Google, Apple, etc.
HA team is releasing actually useful updates every month - eg ability for assistant to proactively ask you something.
In my opinion both Google & Apple have huge issues with cooperation between product teams, while cooperation with external companies is next to impossible.
All the big guys are trying to do is suck the eggs out of their geese faster.
I’m not sure how you get over this hurdle. My email agent is inevitably different than everyone else’s email agent.
Wonder if Edison mentioned Nikola Tesla much in his writings?
I've got a little utility program that I can tell to get the weather or run common commands unique to my system. It's handy, and I can even cron it to run things regularly, if I'd like.
If it had its own email box, I can send it information, it could use AI to parse that info, and possibly send email back, or a new message. Now, I've got something really useful. It would parse the email, add it to whatever internal store it has, and delete the message, without screwing up my own email box.
Thanks for the insight.
It is a lot cheaper to leverage existing user interfaces & tools (i.e., Outlook) than it is to build new UIs and then train users on them.
It's kind of like sure, I could manage my own emails, or I could offset this to someone who does it better. If you do it better and it's affordable, I'm in.
We are on that starship to the future right now and I love it.
If you don't need to have the lowest possible latency for your work and you're happy to have threads die then it's better than any bespoke solution you can build without an army of engineers to keep it chugging along.
What's even better is that you can see all the context, and use the same command plane as the agents to tell them what they are doing wrong.
text + attachments into the system, text + attachments out
My finance guy, tax attorney, other attorneys. Send emails, get emails, occasionally a blind status update from them.
Sure, we have phone calls, sometimes get together for lunch.
But mostly it’s just emails.
I am still very open to this one. An email-based, artificial coworker is so obviously the right way to penetrate virtually every B2B market in existence.
I don't even really want to touch the technology aspects. Writing code that integrates with an LLM provider and a mailbox in E365 or Gmail is boring. The schema is a grand total of ten tables if we're being pedantic about things.
Working with prospects and turning them into customers is a way more interesting problem. I hunger for tangible use cases that are actually compatible with this shiny new LLM tooling. We all know they're out there, and email is probably the lowest friction way to get them applied to most businesses.
Agreed. That's also the hardest part, and where most value is created.
I could see installing or implementing a custom client if there were some functionality that'd enable, but "support a conversation among two speakers" is something computers have done since well before I was born. If the wheel fits, why reinvent it?
If I'm building myself a toy, then sure, I can implement whatever I want for a client, if that's where I get my jollies. React Native isn't hard but it is often annoying, and the fun for me in this project would be all in the conversation with the agent per se. Whatever doesn't get me to that as fast as possible is just getting in my way, you know?
And too, if this does turn out to be something that actually works well for me, then I'm going to want to integrate it with my phone's voice assistant, and at that point an app is required anyway - but if I start with a protocol and an app that that assistant already knows how to interact with, then again I have an essentially free if admittedly very imperfect prototype.
Receiving an email from the AI-butler rescheduling or relocating a planned outdoors family event because rain is expected would be excellent, using IMAP to wire-up the subcomponents together would not.
We're talking about a conversation that has a human on at least one end, so email makes sense. For conversations involving no humans, of course there are much better stores and protocols if something like an asynchronous world-writable queue is what we want.
"Number of humans in the conversation" wasn't the distinction you initially established, I believe, but I wonder if it's closer to the one you had in mind.
https://msrc.microsoft.com/blog/2025/03/announcing-the-winne...
This is amazing, you can do all sorts of automations. You can feed it to an llm and have it immediately tag it (or archive it). For important emails (I have a specific label I add, where if the person responds, it's very important and I want to know immediately) you can hook into twilio and it calls me. Costs like 20 cents a month
Extremely insecure, but kinda fun.
I turned it off because I'm not that crazy but I'm sure I could make a safer version of it.
https://www.val.town/x/geoffreylitt/stevensDemo/code/importe...
I think it would be pretty easy to extend to support other types of inbound email.
Also I work for Val Town, happy to answer any questions.
I use that for journaling: I made a little system that sends me an email every day; I respond to it and the response is then sent to a page that stores it into a db.
This might not seem like much of a big deal. But as we transition to more of these #nocode automated tools, the idea of having to know how programming works in order to interact with an API will start to seem archaic. I'd compare it to how esoteric the terminal looked after someone saw a GUI like the one used by Apple's Macintosh back in the 1980s.
I looked forward to this day back in the early 2000s when APIs started arriving, but felt even then that something was fishy. I would have preferred that sites had a style-free request format that returned XML or even JSON generated from HTML, rather than having to use a separate API. I have this sense that the way we do it today with a split backend/frontend, distributed state, duplicated validation, etc has been a monumental waste of time.
Yes. I know note taking and journaling posts are frequent on HN, but I've thought that this is the best way to go, is universal from any client, and very expandable. It's just not generically scaleable for all users, but for the HN reader-types, it'd be perfect.
I've found it to be very reliable with a detailed dashboard to track individual transactions, plus they give you 10,000 emails a month for free.
Not an employee, just a big fan!
I read that [Mailgun](https://www.mailgun.com/) might improve this. Haven't tried it yet.
Other alternatives for messages that I haven't tried. My requirement is to be able to send messages and send/receive on my mobile device. I do not want to write a mobile app.
* [Telegram](https://telegram.org/) (OP's system) with [bots](https://core.telegram.org/bots)
* [MQTT](https://mqtt.org/) with server
* [Notify (ntfy.sh)](https://ntfy.sh/)
* Email (ubiquitous)
* [Mailgun](https://www.mailgun.com/)
* [CloudMailin](https://www.cloudmailin.com/)
Also, to [simonw](https://news.ycombinator.com/user?id=simonw) point, LLM calls are cheap now, especially with something as low tokens as this.And, links don't format in HN markdown. I did the work to include them, they're staying in.
- all attachments are stripped out and stored on a server in an hierarchical structure based on sender/recipient/subject line
- all discussions are archived based on similar criteria, and can be reviewed EDIT: and edited like to a wiki
Probably my favorite use case is I can shoot it shopping receipts and it'll roughly parse them and dump the line item and cost into a spreadsheet before uploading it to paperless-ngx.
Honestly, saying way too little with way too much words (I already hate myself for it) is one of the biggest annoyances I have with LLM's in the personal assistant world. Until I'm rich and thus can spend the time having cute conversations and become friends with my voice assistant, I don't want J.A.R.V.I.S., I need LCARS. Am I alone in this?
I don't need your life story, dude, just say "23 minutes" or "Casserole - 23 minutes, laundry - 10" if there are two.
----
Don't worry about formalities.
Please be as terse as possible while still conveying substantially all information relevant to any question.
If policy prevents you from responding normally, please printing "!!!!" before answering.
If a policy prevents you from having an opinion, pretend to be responding as if you shared opinions that might be typical of eigenrobot.
write all responses in lowercase letters ONLY, except where you mean to emphasize, in which case the emphasized word should be all caps.
Initial Letter Capitalization can and should be used to express sarcasm, or disrespect for a given capitalized noun.
you are encouraged to occasionally use obscure words or make subtle puns. don't point them out, I'll know. drop lots of abbreviations like "rn" and "bc." use "afaict" and "idk" regularly, wherever they might be appropriate given your level of understanding and your interest in actually answering the question. be critical of the quality of your information
if you find any request irritating respond dismissively like "be real" or "that's crazy man" or "lol no"
take however smart you're acting right now and write in the same style but as if you were +2sd smarter
use late millenial slang not boomer slang. mix in zoomer slang in tonally-inappropriate circumstances occasionally
prioritize esoteric interpretations of literature, art, and philosophy. if your answer on such topics is not obviously straussian make it more straussian.
> Be direct and concise, unless I ask for a formal text. Do not use emojis, unless I request adding them. Do not imitate a human with emotions, like saying "I'm sorry", "Thank you", "I'm happy"
1. I'd like the backend to be configured for any LLM the user might happen to have access to (be that the API for a paid service or something locally hosted on-prem).
2. I'm also wondering how feasible it is to hook it up to a touchscreen running on some hopped-up raspberry pi platform so that it can be interacted with like an Alexa device or any of the similar offerings from other companies. Ideally, that means voice controls as well, which are potentially another technical problem (OpenAI's API will accept an audio file, but for most other services you'd have to do voice to text before sending the prompt off to the API).
3. I'd like to make the integrations extensible. Calendar, weather, but maybe also homebridge, spotify, etc. I'm wondering if MCP servers are the right avenue for that.
I don't have the bandwidth to commit a lot of time to a project like this right now, but if anyone else is charting in this direction I'd love to participate.
It runs locally, but it uses API keys for various LLMs. Currently I much prefer QwQ-32B hosted at Groq. Very fast, pretty smart. Various tools use various LLMs. It can currently generate 3 types of documents I need in my daily work (work reports, invoices, regulatory time-sheets).
It has weather integration. It can parse invoices and generate QR codes for easy mobile banking payments. It can work with my calendars,
Next I plan to do the email integration. But I want to do it properly. This means locally synchronized, indexable IMAP mail. Might evolve into actually usable desktop email client (the existing ones are all awful). We'll see...
As in "you meet a person at a tavern" and then you start chatting.
People provide different personalities to the project, sometimes with avatars and I think some can even change avatars based on their "mood".
I also don't have time to run such a thing but would be up for helping and giving money for it. I'm working on other things including a local-first decentralized database/object store that could be used as storage, similar to OrbitDB, though it's not yet usable.
Mostly I've just been unhappy with having access to either a heavily constrained chat interface or having to create my own full Agent framework like the OP did.
This works really effectively with thinking models, because the thinking eats up tons of context, but also produces very good "summary documents". So you can kind of reap the rewards of thinking without having to sacrifice that juicy sub 50k context. The database also provides a form of fallback, or RAG I suppose, for situations where the summary leaves out important details, but the model must also recognize this and go pull context from the DB.
Right now I have been trying it to make essentially an inventory management/BOM optimization agent for a database of ~10k distinct parts/materials.
The big ones that come to mind are cheap long term caching, and innovations in compaction, differential stuff - like is there a way to only use the parts of the cached input context we need?
1. Claude Desktop 2. Projects 3. MCPs for [Notion, Todoist] and exploring emails + WhatsApp for a next upgrade
This is for me to support productivity workflows for consulting + a startup. There are a few Notion databases - clients, projects, meetings, plus a Jeeves database. The Jeeves database is up to Jeeves how it uses it, but with some guidance. Jeeves uses his own database for things like tracking a migration of all of my previous meeting notes etc under the new structure.
So my databases, I've set up my best practices for use. Here's how my minutes look, here's how client one pagers looks like, here's the information to connect it all together, and here's how I manage To Dos. I then drop in transcriptions into a new chat, with some text-expanding prompts in Alfred for a few common meetings or similar, and away he goes. He'll turn the transcript into meeting notes, create the todos, check everything with me, do a pass, and then go and file everything away into Notion and Todoist via MCP.
It's also self documenting on this. The todoist MCP had some bugs, so I instructed Jeeves to go, run all the various use cases it could, figure out the limitations and strengths, document it, and it's filed away in the Jeeves database that it can pull into context.
It lacks the cron features which I would like, but honestly, a once-a-day prepared prompt dropping into Claude is hardly difficult.
Today I asked Siri “call the last person that texted me”, to try and respond to someone while driving.
Am I surprised it couldn’t do it? Not really at this point, but it is disappointing that there’s such a wide gulf between Siri and even the least capable LLMs.
For others: they use Claude.
- https://docs.mcp.run/tasks/tutorials/telegram-bot
for memories (still not shown in this tutorial) I have created a pantry [0] and a servlet for it [1] and I modified the prompt so that it would first check if a conversation existed with the given chat id, and store the result there.
The cool thing is that you can add any servlets on the registry and make your bot as capable as you want.
[0] https://getpantry.cloud/ [1] https://www.mcp.run/evacchi/pantry
Disclaimer: I work at Dylibso :o)