www.theregister.com Open in urlscan Pro
104.18.4.22  Public Scan

Submitted URL: https://tracking.tldrnewsletter.com/CL0/https://www.theregister.com/2023/06/23/open_source_licenses_ai/?utm_source=tldrnewsletter/1/...
Effective URL: https://www.theregister.com/2023/06/23/open_source_licenses_ai/?utm_source=tldrnewsletter
Submission: On June 26 via api from IN — Scanned from DE

Form analysis 2 forms found in the DOM

POST /CBW/custom

<form id="RegCTBWFAC" action="/CBW/custom" class="show_regcf_custom" method="POST">
  <h5>Manage Cookie Preferences</h5>
  <ul>
    <li>
      <label>
        <input type="checkbox" disabled="disabled" checked="checked" name="necessary" value="necessary">
        <strong>Necessary</strong>. <strong>Always active</strong>
      </label>
      <label for="accordion_necessary" class="accordion_toggler">Read more<img width="7" height="10" alt="" src="/design_picker/d2e337b97204af4aa34dda04c4e5d56d954b216f/graphics/icon/arrow_down_grey.svg" class="accordion_arrow"></label>
      <div class="accordion">
        <input type="checkbox" id="accordion_necessary">
        <p class="accordion_info"> These cookies are strictly necessary so that you can navigate the site as normal and use all features. Without these cookies we cannot provide you with the service that you expect. </p>
      </div>
    </li>
    <li>
      <label>
        <input type="checkbox" name="tailored_ads" value="tailored_ads">
        <strong>Tailored Advertising</strong>. </label>
      <label for="accordion_advertising_tailored_ads" class="accordion_toggler">Read more<img width="7" height="10" alt="" src="/design_picker/d2e337b97204af4aa34dda04c4e5d56d954b216f/graphics/icon/arrow_down_grey.svg"
          class="accordion_arrow"></label>
      <div class="accordion">
        <input type="checkbox" id="accordion_advertising_tailored_ads">
        <p class="accordion_info"> These cookies are used to make advertising messages more relevant to you. They perform functions like preventing the same ad from continuously reappearing, ensuring that ads are properly displayed for advertisers,
          and in some cases selecting advertisements that are based on your interests. </p>
      </div>
    </li>
    <li>
      <label>
        <input type="checkbox" name="analytics" value="analytics">
        <strong>Analytics</strong>. </label>
      <label for="accordion_analytics" class="accordion_toggler">Read more<img width="7" height="10" alt="" src="/design_picker/d2e337b97204af4aa34dda04c4e5d56d954b216f/graphics/icon/arrow_down_grey.svg" class="accordion_arrow"></label>
      <div class="accordion">
        <input type="checkbox" id="accordion_analytics">
        <p class="accordion_info"> These cookies collect information in aggregate form to help us understand how our websites are being used. They allow us to count visits and traffic sources so that we can measure and improve the performance of our
          sites. If people say no to these cookies, we do not know how many people have visited and we cannot monitor performance. </p>
      </div>
    </li>
  </ul> See also our <a href="https://www.theregister.com/Profile/cookies/">Cookie policy</a> and <a href="https://www.theregister.com/Profile/privacy/">Privacy policy</a>. <input type="submit" value="Accept Selected" class="reg_btn_primary"
    name="accept" id="RegCTBWFBAC">
</form>

POST /CBW/all

<form id="RegCTBWFAA" action="/CBW/all" method="POST" class="hide_regcf_custom">
  <input type="submit" value="Accept All Cookies" name="accept" class="reg_btn_primary" id="RegCTBWFBAA">
</form>

Text Content

Oh no, you're thinking, yet another cookie pop-up. Well, sorry, it's the law. We
measure how many people read us, and ensure you see relevant ads, by storing
cookies on your device. If you're cool with that, hit “Accept all Cookies”. For
more info and to customize your settings, hit “Customize Settings”.

REVIEW AND MANAGE YOUR CONSENT

Here's an overview of our use of cookies, similar technologies and how to manage
them. You can also change your choices at any time, by hitting the “Your Consent
Options” link on the site's footer.

MANAGE COOKIE PREFERENCES

 * Necessary. Always active Read more
   
   These cookies are strictly necessary so that you can navigate the site as
   normal and use all features. Without these cookies we cannot provide you with
   the service that you expect.

 * Tailored Advertising. Read more
   
   These cookies are used to make advertising messages more relevant to you.
   They perform functions like preventing the same ad from continuously
   reappearing, ensuring that ads are properly displayed for advertisers, and in
   some cases selecting advertisements that are based on your interests.

 * Analytics. Read more
   
   These cookies collect information in aggregate form to help us understand how
   our websites are being used. They allow us to count visits and traffic
   sources so that we can measure and improve the performance of our sites. If
   people say no to these cookies, we do not know how many people have visited
   and we cannot monitor performance.

See also our Cookie policy and Privacy policy.
Customize Settings


Sign in / up




TOPICS

Security


SECURITY

All SecurityCyber-crimePatchesResearchCSO (X)
Off-Prem


OFF-PREM

All Off-PremEdge + IoTChannelPaaS + IaaSSaaS (X)
On-Prem


ON-PREM

All On-PremSystemsStorageNetworksHPCPersonal Tech (X)
Software


SOFTWARE

All SoftwareAI + MLApplicationsDatabasesDevOpsOSesVirtualization (X)
Offbeat


OFFBEAT

All OffbeatDebatesColumnistsScienceGeek's GuideBOFHLegalBootnotesSite NewsAbout
Us (X)
Special Features


SPECIAL FEATURES

The Reg in Space Emerging Clean Energy Tech Week Spotlight on RSA Energy
Efficient Datacenters All Special Features


VENDOR VOICE

Vendor Voice


VENDOR VOICE

All Vendor VoiceAmazon Web Services (AWS) Business TransformationDDNGoogle Cloud
for StartupsHewlett Packard EnterpriseIntel vPro (X)
Resources


RESOURCES

Whitepapers Webinars & Events Newsletters


AI + ML

80


OPEN SOURCE LICENSES NEED TO LEAVE THE 1980S AND EVOLVE TO DEAL WITH AI

80


TIME TO GET WITH THE PROGRAM... BEFORE ARTIFICIAL INTELLIGENCE DOES

Steven J. Vaughan-Nichols
Fri 23 Jun 2023 // 08:30 UTC




Opinion Free software and open source licenses evolved to deal with code in the
1970s and '80s. Today it must again transform to deal with AI models.

AI was born from open source software. But the free software and open source
licenses, based on copyright law, to deal with software code are not a good fit
for the large language model (LLM) neural nets and datasets that fuel AI's open
source software. Since many programming datasets, in particular, are based on
free software and open source code, something must be done. And that's why
Stefano Maffulli, Open Source Initiative (OSI) executive director, and a host of
other open source and AI leaders are working on combining AI and open source
licenses in ways that will make sense for both.

Lest you think this is some kind of theoretical, legal discussion with no impact
on the real world, think again. Consider J. Doe 1 et al vs GitHub. The
plaintiffs in this case in the United States Northern District Court of
California allege Microsoft, OpenAI, and GitHub, via their commercial AI-based
system, OpenAI's Codex and GitHub's Copilot, had ripped off their open source
code. The result? The plaintiffs claim that "suggested" code consists of often
near-identical copies of code scraped from public GitHub repositories, without
the required open source license attributions.



This case continues. The amended complaint includes accusations of violating the
Digital Millennium Copyright Act, breach of contract (open source license
violations), unfair enrichment, and unfair competition claims, and breach of
contract (selling licensed materials in violation of GitHub's policies).




Don't think this kind of lawsuit is just Microsoft's problem. It's not. Sean
O'Brien, a Yale Law School lecturer in cybersecurity and founder of the Yale
Privacy Lab, told my colleague David Gewirtz: "I believe there will soon be an
entire sub-industry of trolling that mirrors patent trolls, but this time
surrounding AI-generated works. A feedback loop is created as more authors use
AI-powered tools to ship code under proprietary licenses. Software ecosystems
will be polluted with proprietary code that will be the subject of
cease-and-desist claims by enterprising firms."

He's right. I've been covering patent trolls for decades. I guarantee that
licensing trolls will come after "your" ChatGPT and Copilot code. 



Some people, such as Felix Reda, a German researcher and politician, claim that
all AI-produced code is public domain. US attorney Richard Santalesa, a founding
member of the SmartEdgeLaw Group, observed to Gewirtz that there are contract
and copyright law issues. They're not the same thing. Santalesa believes
companies producing AI-generated code will "as with all of their other IP, deem
their provided materials – including AI-generated code – as their property." In
any case, however, public domain code is not the same thing as open source code.

 * Will Flatpak and Snap replace desktop Linux native apps?
 * Red Hat promises AI trained on 'curated' and 'domain-specific' data
 * EU's Cyber Resilience Act contains a poison pill for open source developers
 * Here's how the data we feed AI determines the results

On top of all that, there's the whole issue of how the datasets should be
licensed. There are many "open" datasets under numerous open source licenses,
but it's not usually a good fit.

In our conversation, Open Source Initiative's Maffulli elaborated on how various
artifacts produced by AI and machine learning systems fall under different laws
and regulations. The open source community must determine which laws best serve
their interests. Maffulli compared the current situation to the late '70s and
'80s when software emerged as a distinct discipline, and copyright began to be
applied to the source and binary codes.

We're at a similar crossroads today. AI programs such as TensorFlow, PyTorch,
and Hugging Face Hub work well under their open source licenses. The new AI
artifacts are another story. Datasets, models, weights, etc. don't fit squarely
into the traditional copyright model. Maffulli argued that the tech community
should devise something new that aligns better with our objectives, rather than
relying on "hacks."

Specifically, open source licenses designed for software, Maffulli noted, might
not be the best fit for AI artifacts. For instance, while MIT License's broad
freedoms could potentially apply to a model, questions arise for more complex
licenses like Apache or the GPL. Maffulli also addressed the challenges of
applying open source principles to sensitive fields like healthcare, where
regulations around data access pose unique hurdles. The short version of this is
that medical data can't be open sourced.



Simultaneously, most commercial LLMs datasets are black boxes. We literally
don't know what's in them. So we end up, as the Electronic Frontier Foundation
(EFF) puts it, in a situation where we have "Garbage In, Gospel Out." We need,
the EFF concludes, open data.

So it is that the OSI, said Maffulli, together with Open Forum Europe, Creative
Commons, Wikimedia Foundation, Hugging Face, GitHub, the Linux Foundation, ACLU
Mozilla, and the Internet Archive are working on a draft for defining a common
understanding of open source AI principles. This will be "critical in
conversations with legislative bodies." Even now, EU, US, and UK government
agencies are struggling to develop AI regulation, and they're woefully
under-equipped to deal with the issues.

Stefano concluded by saying we should start with "a return to the basics," the
GNU Manifesto, which predates most licenses and sets the "North Star" for the
open source movement. Maffulli suggested that its principles remain surprisingly
relevant when applied to AI systems. By focusing on first principles, we'll be
better able to navigate this complex intersection of AI and open source. ®

Get our Tech Resources

Share



SIMILAR TOPICS

 * AI
 * Copyright
 * Large Language Model

More like these
×


SIMILAR TOPICS

 * AI
 * Copyright
 * Large Language Model
 * Open Source
 * Software


NARROWER TOPICS

 * AdBlock Plus
 * App
 * Application Delivery Controller
 * Audacity
 * Confluence
 * Database
 * FOSDEM
 * Google AI
 * GPT-3
 * Grab
 * IDE
 * Jenkins
 * LibreOffice
 * Machine Learning
 * Map
 * MCubed
 * Microsoft 365
 * Microsoft Office
 * Microsoft Teams
 * MySQL
 * Neural Networks
 * NLP
 * OpenOffice
 * Programming Language
 * QR code
 * Retro computing
 * Search Engine
 * Software bug
 * Software License
 * Star Wars
 * Tensor Processing Unit
 * User interface
 * Visual Studio
 * Visual Studio Code
 * WebAssembly
 * Web Browser
 * Wikipedia


BROADER TOPICS

 * ChatGPT
 * FOSS
 * Self-driving Car

SIMILAR TOPICS

Share


80 COMMENTS

SIMILAR TOPICS

 * AI
 * Copyright
 * Large Language Model

More like these
×


SIMILAR TOPICS

 * AI
 * Copyright
 * Large Language Model
 * Open Source
 * Software


NARROWER TOPICS

 * AdBlock Plus
 * App
 * Application Delivery Controller
 * Audacity
 * Confluence
 * Database
 * FOSDEM
 * Google AI
 * GPT-3
 * Grab
 * IDE
 * Jenkins
 * LibreOffice
 * Machine Learning
 * Map
 * MCubed
 * Microsoft 365
 * Microsoft Office
 * Microsoft Teams
 * MySQL
 * Neural Networks
 * NLP
 * OpenOffice
 * Programming Language
 * QR code
 * Retro computing
 * Search Engine
 * Software bug
 * Software License
 * Star Wars
 * Tensor Processing Unit
 * User interface
 * Visual Studio
 * Visual Studio Code
 * WebAssembly
 * Web Browser
 * Wikipedia


BROADER TOPICS

 * ChatGPT
 * FOSS
 * Self-driving Car

TIP US OFF

Send us news

--------------------------------------------------------------------------------


OTHER STORIES YOU MIGHT LIKE

US EXPORT BAN DRIVES PRICES OF NVIDIA'S LATEST GPUS SKY HIGH IN CHINA

AI in brief Plus: IBM builds AI commentator for Wimbledon; US regulator dithers
on generative AI political ad policy
AI + ML15 hrs | 2

SMALL CUSTOM AI MODELS ARE CHEAP TO TRAIN AND CAN KEEP DATA PRIVATE, SAYS
STARTUP

Interview We talk to MosaicML, a startup driving down training costs with open
source models
AI + ML4 days | 12

GOOGLE WARNS ITS OWN EMPLOYEES: DO NOT USE CODE GENERATED BY BARD

AI in brief PLUS: Nuance voice AI startup hit with privacy lawsuit in
California, and why OpenAI urged Microsoft to hold off releasing Bing
Systems8 days | 13

THE LOG4J VULNERABILITY – HOW CAN WE ALL DO BETTER NEXT TIME?

Accept there are some risks you don’t control but which nonetheless you can’t
ignore
Sponsored Feature


AI IS GOING TO EAT ITSELF: EXPERIMENT SHOWS PEOPLE TRAINING BOTS ARE USING BOTS

We speak to brains behind study into murky world of model teaching
AI + ML10 days | 47

RECIPIENT OF EUROPE'S LARGEST EVER SEED ROUND DOESN'T EVEN HAVE A PRODUCT

Can you guess what it is yet? Here's a clue: It starts with 'A' and ends with
'I'
AI + ML11 days | 36

SURPRISE! GITHUB FINDS 92% OF DEVELOPERS LOVE AI TOOLS

We're fine being judged by code, now that we're getting an assist
AI + ML13 days | 15

LAWYERS WHO CITED FAKE CASES HALLUCINATED BY CHATGPT MUST PAY

Judge sanctions attorneys for failed reality check
AI + ML4 days | 98

LINUX 6.4 DEBUTS AFTER LITERALLY UNREMARKABLE DEVELOPMENT PUSH

Latest cut of the kernel gets RISC-ier, moves towards Wi-Fi 7, ejects PCMCIA
cards
OSes18 hrs | 6

IF AI DRIVES HUMANS TO EXTINCTION, IT'LL BE OUR FAULT

+Comment Should you really believe the doomsayers? We're going to go with no
AI + ML1 day | 72

WHOSE LINE IS IT ANYWAY, GITHUB? INNOVATION, NOT LITIGATION, SHOULD ANSWER

Opinion If Jesus was my Copilot, what would he do?
AI + ML7 days | 43

US GOVERNMENT EXTENDS SOFTWARE SECURITY DEADLINE BECAUSE VENDORS AREN'T READY

This from the Administration that made infosec a priority
Software13 days | 4


The Register Biting the hand that feeds IT

ABOUT US

 * Contact us
 * Advertise with us
 * Who we are

OUR WEBSITES

 * The Next Platform
 * DevClass
 * Blocks and Files

YOUR PRIVACY

 * Cookies Policy
 * Your Consent Options
 * Privacy Policy
 * T's & C's

Copyright. All rights reserved © 1998–2023