Posted by joshdickson 2 days ago
Today I’m excited to launch OpenNutrition: a free, ODbL-licenced nutrition database of everyday generic, branded, and restaurant foods, a search engine that can browse the web to import new foods, and a companion app that bundles the database and search as a free macro tracking app.
Consistently logging the foods you eat has been shown to support long-term health outcomes (1)(2), but doing so easily depends on having a large, accurate, and up-to-date nutrition database. Free, public databases are often out-of-date, hard to navigate, and missing critical coverage (like branded restaurant foods). User-generated databases can be unreliable or closed-source. Commercial databases come with ongoing, often per-seat licensing costs, and usage restrictions that limit innovation.
As an amateur powerlifter and long-term weight loss maintainer, helping others pursue their health goals is something I care about deeply. After exiting my previous startup last year, I wanted to investigate the possibility of using LLMs to create the database and infrastructure required to make a great food logging app that was cost engineered for free and accessible distribution, as I believe that the availability of these tools is a public good. That led to creating the dataset I’m releasing today; nutritional data is public record, and its organization and dissemination should be, too.
What’s in the database?
- 5,287 common everyday foods, 3,836 prepared and generic restaurant foods, and 4,182 distinct menu items from ~50 popular US restaurant chains; foods have standardized naming, consistent numeric serving sizes, estimated micronutrient profiles, descriptions, and citations/groundings to USDA, AUSNUT, FRIDA, CNF, etc, when possible.
- 313,442 of the most popular US branded grocery products with standardized naming, parsed serving sizes, and additive/allergen data, grounded in branded USDA data; the most popular 1% have estimated micronutrient data, with the goal of full coverage.
Even the largest commercial databases can be frustrating to work with when searching for foods or customizations without existing coverage. To solve this, I created a real-time version of the same approach used to build the core database that can browse the web to learn about new foods or food customizations if needed (e.g., a highly customized Starbucks order). There is a limited demo on the web, and in-app you can log foods with text search, via barcode scan, or by image, all of which can search the web to import foods for you if needed. Foods discovered via these searches are fed back into the database, and I plan to publish updated versions as coverage expands.
- Search & Explore: https://www.opennutrition.app/search
- Methodology/About: https://www.opennutrition.app/about
- Get the iOS App: https://apps.apple.com/us/app/opennutrition-macro-tracker/id...
- Download the dataset: https://www.opennutrition.app/download
OpenNutrition’s iOS app offers free essential logging and a limited number of agentic searches, plus expenditure tracking and ongoing diet recommendations like best-in-class paid apps. A paid tier ($49/year) unlocks additional searches and features (data backup, prioritized micronutrient coverage for logged foods), and helps fund further development and broader library coverage.
I’d love to hear your feedback, questions, and suggestions—whether it’s about the database itself, a really great/bad search result, or the app.
1. Burke et al., 2011, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3268700/
2. Patel et al., 2019, https://mhealth.jmir.org/2019/2/e12209/
U.S. law does not require food manufacturers to disclose everything that goes into their products. Under the Code of Federal Regulations (21 CFR § 101.100), there are exemptions to ingredient labeling... An example: flavorings, spices, and incidental additives (like processing aids or anti-caking agents) are not always listed explicitly. Also: proprietary blends and "natural flavors" can legally conceal dozens of chemicals (some synthetic), which consumers have no way of identifying.
Micronutrient data is often estimated or missing from labels and restaurant menus, which limits the accuracy of even the best-intentioned databases. Studies show that the nutritional information provided by restaurants and brands is frequently incomplete or inaccurate, especially when it comes to sodium, sugar, and actual serving sizes. (Urban et al. "The Energy Content of Restaurant Foods Without Stated Calorie Information" ; Labuza et al., 2008 and others)
IMO Food databases are only as accurate as the source data allows. Until food labeling laws mandate full disclosure and third-party verification, apps like this can support health awareness. Still, they shouldn't be treated as precise medical or dietary guidance—particularly for people with allergies, sensitivities, or chronic health conditions that require strict tracking.
Maybe that would be a good source to challenge and validate the values provided by your LLM approach.
https://www.mext.go.jp/en/policy/science_technology/policy/t...
When something doesn't have a reference listed, and just says "sourced from a publicly available first-party datasource", what does that mean? Crawled from other sources and you'd prefer not to say? The wording does feel a little sketchy when contrasted with entries that do list sources.
When something does list references that don't seem super close to the actual food, what is the process like there for interpreting those values? Example, this Chicken Salad inheriting from Chicken Spread: https://www.opennutrition.app/search/chicken-salad-37mAX17YX...
The quality of the data might feel rough now, but I can see this being valuable for our users even if it's just an opt-in "show estimated micronutrients" or something. Would require labeling values as not being directly from a source of truth.
One thing that a lot of people are missing is that there is already a lot of inaccurate nutrition data out there. Even on information directly from the manufacturer, sometimes there are errors, or just old versions of the product that never get scrubbed from the internet (I imagine the latter case would be tricky for an LLM to deal with too). Just logging your dietary intake in any form will get you 80% of the benefit of tracking via some self awareness of your intake. Of course, it's an easy argument to point out that if you had the choice between verified data and fuzzy LLM data, you should go for the human verified data (for now).
> When something doesn't have a reference listed, and just says "sourced from a publicly available first-party datasource", what does that mean?
It depends, and the degree to which it depends is why the citation is ambiguous (although it is true, if imprecise). My goal is to individually cite the individual nutrients but it was simply too costly and time-consuming at the stage of the project at which I did this work.
> what is the process like there for interpreting those values?
Because the degree to which something in the database might be related to those values is so varied, it depends. The reasoning agent had access to those database entires, which is helpful because they tend to contain micronutrient data. It also had access to web data, as well as its own world knowledge, and considers sources in that order. Ultimately it was left up to the agent to decide what the most reasonable fit for each food was, thinking through what an average user likely meant by that entry (e.g. a typical user probably assumes a 'Tomato' is raw), and then to choose the best sources from there. For the chicken salad, it used approximate micronutrient values from the listed references to inform its answer, but adapted the end values for how the dish is described in the description.
> if you had the choice between verified data and fuzzy LLM data, you should go for the human verified data (for now)
Human verification isn't free, and that means it is not available to a lot of people who can't or don't want to pay for something. But if that's something that someone values, I would certainly not diss the human effort!
I've recently been considering making my own open source nutrition app, (since every single one I've looked at seems to either violate my privacy&security, or is designed/works very poorly), but the available "open" nutrition info databases for bootstrapping have seemed poor.
So I looked at the license of this database, and the idea of making it "open" is good and maybe appropriate. But the attribution requirements to promote this other, commercial, product are a little annoying. And could also be a little confusing in app store listings.
> Attribution Requirements: If you display or use any data from this dataset, you must provide clear attribution to "OpenNutrition" with a link to https://www.opennutrition.app in:
> * Every interface where data is displayed
> * Application store listings
> * Your website
> * Legal/about sections
Additionally, I've soured on single companies that call themselves "open". "Open" has a few-decades history in computers, as everyone realized the dangers and costs of proprietary lock-ins, and so created concepts such as "open systems" and "open standards". Appropriating the "open" term for a single company, for something more proprietary than open (like the very proprietary OpenAI that's mentioned many times in https://www.opennutrition.app/about ), rubs a bit the wrong way.
https://wiki.openfoodfacts.org/ODBL_License
You may disagree with each of those projects as well, but, I am following long-standing licensing in this space. I also have used some OFF data for product naming, and as a result, their terms state I have to maintain their license.
Creating these databases involves a tremendous amount of time and effort, and it would not make sense for me to make this data available to commercial entities to use without attribution. The alternative is not a MIT-licensed dataset, it is no dataset.
I appreciate the difficulty of building a good database. Can you say why you created a new one, rather than starting with OpenFoodFacts? (Was it quality issues? Too hard to update? You wanted additional info? You didn't want their licensing terms? You wanted the advertising boost?)
Also IDK where AI is wrt automated scraping but I've had some success feeding recipes into AI and getting the nutrition facts out. The ability to plop a URL in and get a scraped recipe with a name and nutrition facts would be immense.
> If I can join the endless queue of feature requests, the ability to scale the portion size and update the nutrition facts would be great
This is all supported in-app if you're in a country with the ability to download it and have iOS (for now). The web product is more of a demo and isn't intended to be used on a day-to-day basis to track your food consumption, but this is a totally reasonable request.
> Also IDK where AI is wrt automated scraping but I've had some success feeding recipes into AI and getting the nutrition facts out. The ability to plop a URL in and get a scraped recipe with a name and nutrition facts would be immense.
I am not doing this for a few reasons, but, you can just screenshot the image of the recipe and use the app to upload that as a meal or recipe and it should parse out the ingredients and portions for you.
Could you possibly add an option to see the nutrient content per 100g serving? This is way more usefull to Europeans than something like a cup as a unit.
In the top-right of the table in the web search, you can change the toggle from "Per Serving" to "Per 100g", though this is just for the table view.
The big thing I've realized through this exercise is just how much of a creature of habit I am. Inputting what I've eaten over the previous day is mostly copying and pasting rows from previous days sheets, and I suspect I could simplify input even further. Most people would be in a similar position and should be able to build their own lists by reading the nutritional information already available. When that's not available It doesn't necessarily I found r/caloriecount to be a useful resource. It need not be perfect either, just as long as you're doing it consistently.
When I first found Cronometer and started using it daily, I did what every developer does and looked at what kind of data exists out there if I wanted to build my own app. The free data from the FDA was pretty bad/limited with massive holes and it would have taken a lot of effort to clean up.
Of course, Cronometer's best data comes from https://www.ncc.umn.edu/food-and-nutrient-database/.
Maybe you can sample your data and validate it against NCC's data via Cronometer to see if your LLM approach has legs when it comes to micronutrients and amino acids. And note that you have AIgen data that NCC's hand-measured database doesn't even have reliably, like choline, which seems like a red flag.
Have you asked one of the LLMs used to tell you about the choline content of a food, even ungrounded? They are surprisingly good at reasoning about what kinds of foods tend to contain large amounts of choline because their training datasets will include all kinds of similar data points, even if the single food you're looking for doesn't have it listed explicitly.
Incidentally o3-mini-high got the fried breakfast I added to a tracking app this morning within 50 calories!