What criteria do you think is absolutely essential for an AI model to qualify as open source?
Sort by:
or ... use the latest definitions: https://siliconangle.com/2024/10/28/osi-clarifies-makes-ai-systems-open-source-open-models-fall-short/
Comment below is not legal advice, consult your lawyer.
I agree with James Harris on the criteria for open source, however, a related issue is the type of use which is permitted under the applicable license(s). There are dozens of licensing models applicable for open-source projects from the commonly used Apache 2.0 to the less well-known Chicken Dance License v0.2 (CDL) created by Andrew Harris. The CDL allows a prospective user to do the chicken dance on social media instead of distributing the new source code created from the licensed code.
There is a common misconception that open source means free to use for any purpose, some LLMs licensing restrictions prohibit commercial use. Unlike the example above, not all license holders take a light-hearted approach to intellectual property rules.
Ensure you are aware of any restrictions or affirmative requirements (such as public redistribution) at the onset of the project to prevent costly legal issues later.
When we refer to open source, the model’s architecture and source code should be accessible to anyone, under a genuine open-source license. Transparency regarding data sources, datasets, and comprehensive documentation is essential, with particular emphasis on data sources, as they reveal what the model has been trained on, potentially raising ethical and privacy concerns. Ideally, the model should be community-driven, allowing developers to contribute and enhance it collaboratively. A significant advantage of this open-source approach is its adherence to ethical and privacy standards, as open access to data and source code facilitates compliance and accountability.
From our open source aiSSEMBLE solution lead at Booz Allen are the following criteria:
1. Source code transparency: The source code should be publicly accessible, including associated dependencies.
2. Model is Accessible: The model should be publicly accessible in a repository or model catalog
3. Model Weights are accessible: Able to see the model weights that govern how a model behaves and available in serialized format that enables transfer learning (fine-tuning).
4. Technical Documentation: Documentation sufficient so that a third party is able to install, deploy and execute inference on the model.
5. License Requirements: The specific open-source license and terms of use should be clearly identified.
One additional consideration would be Reproducibility Requirements: The requirements to reproduce the model from scratch should be clearly articulated, including:
a. Versioned dataset(s) used and their location
b. Hardware requirements for training
c. Hyperparameters and other information
I can access the code and training data sets w/transparency.