Posted by Brajeshwar 1 day ago
So even they might believe in open-source they put protections in place that ultimately lock it down and thus make it closed source but trying to keep the impression of being open.
In our journey at AirGradient towards becoming fully open-source hardware (all code and hardware licensed under CC-BY-SA), we had the same concerns but ultimately decided to go full-in and open up everything with an officially approved open-source license.
I believe there are a few important aspects and "protections" that are open-source compatible that help companies protect their investments.
Firstly, requiring Attribution is compatible with open-source and can help companies get a lot of visibility and competitors probably don't want to attribute another company and thus are often not likely to clone.
Secondly, using a share-alike license also makes it unattractive for many other companies using the code.
Lastly, I believe the code itself is often not the valuable part compared to the brand value, employees, reputation, business model, network and implicit knowledge that a company builds up.
It really worked for us to go that way with a true open-source license and I hope many others will do it too.
There are already some easy to understand licenses like CC in place and I do hope that they also create awareness around "open washing".
It's absolutely wild to think the deranged BigScience RAIL license, under which the Bloom LLM was released, is open in any way shape or form. It has more user-harming restrictions than basically any other LLM license out there.
Microsoft has a decent LLM that I'd consider to be "open source": Phi-3.5, under the MIT license: https://huggingface.co/microsoft/Phi-3.5-vision-instruct
I find it funny how OpenAI was only indirectly mentioned. Still, I'm glad that this columnist is taking a principled stance by arguing aginst one of the more borderline cases.
Anyone know specifically what he is talking about here?
The only things I'm seeing that I would consider to be clauses on litigation are one that terminates your license if you sue them claiming Llama 3 or its output violates your IP, and the have a choice of venue and choice of forum clause.
Several OSI approved licenses have "terminate on patent suit" clauses. Llama 3 is termination on IP suit rather than just on patent suit but I don't see anything in the OSD where that would make a difference.
There's stuff about trademarks, which I assume are the branding clauses he mentions. But I don't see anything obvious on the OSD that such clauses violate.
From https://www.llama.com/llama3/license/
> If, on the Meta Llama 3 version release date, the monthly active users of the products or services made available by or for Licensee, or Licensee’s affiliates, is greater than 700 million monthly active users in the preceding calendar month, you must request a license from Meta, which Meta may grant to you in its sole discretion, and you are not authorized to exercise any of the rights under this Agreement unless or until Meta otherwise expressly grants you such rights.
This seems harmless... until you ask what happens if you start a startup on top of Llama 3, do really well and later try to get acquired by one of the companies that had more than 700m active users on that date (Apple, Microsoft, Google etc)
> You will not use the Llama Materials or any output or results of the Llama Materials to improve any other large language model (excluding Meta Llama 3 or derivative works thereof).
That's a pretty huge restriction on ways you can use the models. The language "to improve any other large language model" is also incredibly vague.
> (B) prominently display “Built with Meta Llama 3” on a related website, user interface, blogpost, about page, or product documentation. If you use the Llama Materials to create, train, fine tune, or otherwise improve an AI model, which is distributed or made available, you shall also include “Llama 3” at the beginning of any such AI model name.
I love this one, it means that if you fine-tune a model for erotic furry fan fiction you HAVE to call it "Llama 3 Erotic Furry Fan Fiction Writer" or similar.
How exactly would they know if I do?
Also, it doesn't make any sense that they trained this model using whatever stuff they could download from the Internet but we somehow could bot do the same with their models.
Hence, the question.
Simonw's response points out some unusual clauses, and at least one of them looks like it might go against one of the requirements in the OSD but it is not a litigation or branding clause and the article specifically called out the litigation and branding clauses.
One can debate "clear" but the the AI Act https://eur-lex.europa.eu/eli/reg/2024/1689/oj does say in Recitals 102-104 (mini open source license definition *highlighted*):
---
(102) Software and data, including models, released under a free and open-source licence that allows them to be openly shared and where users can freely access, use, modify and redistribute them or modified versions thereof, can contribute to research and innovation in the market and can provide significant growth opportunities for the Union economy. General-purpose AI models released under free and open-source licences should be considered to ensure high levels of transparency and openness if their parameters, including the weights, the information on the model architecture, and the information on model usage are made publicly available. *The licence should be considered to be free and open-source also when it allows users to run, copy, distribute, study, change and improve software and data, including models under the condition that the original provider of the model is credited, the identical or comparable terms of distribution are respected.*
(103) Free and open-source AI components covers the software and data, including models and general-purpose AI models, tools, services or processes of an AI system. Free and open-source AI components can be provided through different channels, including their development on open repositories. For the purposes of this Regulation, AI components that are provided against a price or otherwise monetised, including through the provision of technical support or other services, including through a software platform, related to the AI component, or the use of personal data for reasons other than exclusively for improving the security, compatibility or interoperability of the software, with the exception of transactions between microenterprises, should not benefit from the exceptions provided to free and open-source AI components. The fact of making AI components available through open repositories should not, in itself, constitute a monetisation.
(104) The providers of general-purpose AI models that are released under a free and open-source licence, and whose parameters, including the weights, the information on the model architecture, and the information on model usage, are made publicly available should be subject to exceptions as regards the transparency-related requirements imposed on general-purpose AI models, unless they can be considered to present a systemic risk, in which case the circumstance that the model is transparent and accompanied by an open-source license should not be considered to be a sufficient reason to exclude compliance with the obligations under this Regulation. In any case, given that the release of general-purpose AI models under free and open-source licence does not necessarily reveal substantial information on the data set used for the training or fine-tuning of the model and on how compliance of copyright law was thereby ensured, the exception provided for general-purpose AI models from compliance with the transparency-related requirements should not concern the obligation to produce a summary about the content used for model training and the obligation to put in place a policy to comply with Union copyright law, in particular to identify and comply with the reservation of rights pursuant to Article 4(3) of Directive (EU) 2019/790 of the European Parliament and of the Council (40).
---
In the articles open-source is expressly referred to as release under an open-soruce license (see definition in recitals above):
---
[Article 2: Scope]
12. This Regulation does not apply to AI systems released under free and open-source licences, unless they are placed on the market or put into service as high-risk AI systems or as an AI system that falls under Article 5 or 50.
[Article 25: Responsibilities along the AI value chain]
4. The provider of a high-risk AI system and the third party that supplies an AI system, tools, services, components, or processes that are used or integrated in a high-risk AI system shall, by written agreement, specify the necessary information, capabilities, technical access and other assistance based on the generally acknowledged state of the art, in order to enable the provider of the high-risk AI system to fully comply with the obligations set out in this Regulation. This paragraph shall not apply to third parties making accessible to the public tools, services, processes, or components, other than general-purpose AI models, under a free and open-source licence.
[Article 54: Authorised representatives of providers of general-purpose AI models]
6. The obligation set out in this Article shall not apply to providers of general-purpose AI models that are released under a free and open-source licence that allows for the access, usage, modification, and distribution of the model, and whose parameters, including the weights, the information on the model architecture, and the information on model usage, are made publicly available, unless the general-purpose AI models present systemic risks.