• 66 Posts
  • 597 Comments
Joined 1 year ago
cake
Cake day: September 13th, 2024

help-circle



  • The question is: What is an effective legal framework that focuses on the precise harms, doesn’t allow AI vendors to easily evade accountability, and doesn’t inflict widespread collateral damage?

    This is entirely my opinion and I’m likely wrong about many things, but at minimum:

    1. The model has to be open source and freely downloadable, runnable, and copyleft, satisfying the distribution license requirements of copyleft source material (I’m willing to give a free pass to making it copyleft in general, as different copyleft licenses can have different and contradictory distribution license requirements, but IMO the leap from permissive to copyleft is the more important part). I suspect this alone will kill the AI bubble, because as soon as they can’t exclusively profit off it they won’t see AI as “the future” anymore.

    2. All training data needs to be freely downloadable and independently hosted by the AI creator. Goes without saying that only material you can legally copy and host on your own server can be used as training data. This solves the IP theft issue, as IMO if your work is licensed such that it can be redistributed in its entirety, it should logically also be okay to use it as training data. And if you can’t even legally host it on your own server, using it to train AI is off the table. And the independently hosted dataset (complete with metadata about where it came from) also serves as attribution, as you can then search the training data for creators.

    3. Pay server owners for use of their resources. If you’re scraping for AI you at the very least need to have a way for server owners to send you bills. And no content can be scraped from the original source more than once, see point 2.

    4. Either have a mechanism of tracking acknowledgement and accurately generating references along with the code, or if that’s too challenging, I’m personally also okay with a blanket policy where anything AI generated is public domain. The idea that you can use AI generated code derived from open source in your proprietary app, and can then sue anyone who has the audacity to copy your AI generated code, is ridiculous and unacceptable.


  • “Wait, not like that”: Free and open access in the age of generative AI

    I hate this take. “Open source” is not “public domain” or “free reign to do whatever the hell you want with no acknowledgement to the original creator.” Even the most permissive MIT license has terms that every single AI company shamelessly violate. All code derived from open source code need to at the very least reference the original author, so unless the AI can reliably and accurately cite where the code it generates came from, all AI generated code that gets incorporated into any publicly distributed software violates the license of every single open source project it has ever scraped.

    That’s saying nothing about projects with copyleft licenses that place conditions on how the code can then be distributed. Can AI reliably avoid using information from those codebases when generating proprietary code? No? And that’s not a problem because?

    I absolutely hate the hypocrisy that permeates the discourse around AI and copyright. Knocking off Studio Ghibli’s art style is apparently the worst atrocity you can commit but god forbid open source developers, most of whom are working for free, have similar complaints about how their work is used.

    Just because you “can’t” obey the license terms due to some technical limitation doesn’t mean you deserve a free pass from them. It means the technology is either too immature to be used or shouldn’t be used at all. Also, why aren’t they using LLMs when scraping to read the licenses and exclude anything other than pure public domain? Or better yet, use literally last century’s technology to read the robots.txt and actually respect it. It’s not even a technical limitation, it’s a case of doing the right thing is too restrictive and won’t allow us to accomplish what we want to do so we demand the right thing be expanded to what we’re trying to do.

    Open source only has anywhere between one and two core demands: Credit me for my work and potentially distribute derivatives in a way I can still take advantage of. And even that’s not good enough for these AI chuds, they think we’re the unreasonable ones for having these demands and not letting them use our code with no strings attached.

    This is where many creators find themselves today, particularly in response to AI training. But the solutions they’re reaching for — more restrictive licenses, paywalls, or not publishing at all — risk destroying the very commons they originally set out to build.

    Yeah blame the people getting exploited and not the people doing the exploiting why don’t you.

    Particularly with AI, there’s also no indication that tightening the license even works. We already know that major AI companies have been training their models on all rights reserved works in their ongoing efforts to ingest as much data as possible. Such training may prove to have been permissible in US courts under fair use, and it’s probably best that it does.

    No. Fuck that. There’s nothing fair about scraping an independent creator’s website (costing them real money) and then making massive profits from it. The creator literally fucking paid to have their work stolen.

    If a kid learns that carbon dioxide traps heat in Earth’s atmosphere or how to calculate compound interest thanks to an editor’s work on a Wikipedia article, does it really matter if they learned it via ChatGPT or by asking Siri or from opening a browser and visiting Wikipedia.org?

    Yes. And the fact that it’s stolen isn’t even the biggest problem by a long shot. In fact, even Wikipedia is a pretty shitty source, do what your high school teacher said you should do and search Wikipedia for citations, not the articles themselves.

    Don’t let AI teach you anything you can’t instantly verify with an authoritative source. It doesn’t know anything and therfore can’t teach anything by definition.

    Instead of worrying about “wait, not like that”, I think we need to reframe the conversation to […] “wait, not in ways that threaten open access itself”.

    Okay, let’s do that then. All AI training threaten open access itself. If not by ensuring the creator can never make money to sustain their work, then by LITERALLY COSTING THE CREATORS MONEY WHEN THEIR CONTENT IS SCRAPED! So the conclusion hasn’t changed.

    The true threat from AI models training on open access material is not that more people may access knowledge thanks to new modalities. It’s that those models may stifle Wikipedia and other free knowledge repositories, benefiting from the labor, money, and care that goes into supporting them while also bleeding them dry. It’s that trillion dollar companies become the sole arbiters of access to knowledge after subsuming the painstaking work of those who made knowledge free to all, killing those projects in the process.

    And how does shaming the victims of that knowledge theft for having the audacity to try and do something about it help exactly?

    Anyone at an AI company who stops to think for half a second should be able to recognize they have a vampiric relationship with the commons.

    […]

    And yet many AI companies seem to give very little thought to this,

    “Anyone at a Southern slave plantation who stops to think for half a second should be able to recognize they have a vampiric relationship with their black slaves.” Yeah, they know. That’s the point.





  • HiddenLayer555@lemmy.mltoProgrammer Humor@lemmy.mlelectron.jxl
    link
    fedilink
    English
    arrow-up
    16
    ·
    edit-2
    16 days ago

    it was called CROSS PLATFORM APPS

    Absolutely not unless it’s as sandboxed as the web (which even the web isn’t sandboxed that well).

    Working with software has only made me not trust software (that’s not open source.)

    Why we’re giving any random software full user level access in 2026 is beyond me.





  • I think having a TPM enables a number of worthwhile security features.

    But most of those security features place the TPM at the root of trust, something that is SEVERELY undermined by the fact that it is not open source, meaning it is inherently untrustworthy.

    Is it not the one chip we should demand and accept nothing less than complete openness in its implementation and complete control by the person who owns the device? I also think the types of protections it grants in theory are very good, but the fact that it’s proprietary means it’s terrible at actually granting you those protections.


  • HiddenLayer555@lemmy.mltoAsklemmy@lemmy.mlReside, vacation, party.
    link
    fedilink
    English
    arrow-up
    4
    ·
    edit-2
    21 days ago

    Reside: Vancouver. Biased because I mostly grew up here and am lucky enough to still live here but I really do think it’s something special.

    Vacation: Probably Qing Huang Dao. Gorgeous Chinese beach city that I’ve only ever been to twice in my childhood, but it’s also really nostalgic for that reason.

    Party (hardest one for me because I don’t party): Wanted to say New York or Vegas because nothing in Canada really compares to those, but with current events and my skin colour I’m going with Toronto instead. Like to not end up at a concentration camp while partying, but more generally because I know the laws of my own country better than somewhere else which I feel is a benefit if you want to party.



  • I don’t drive so it has more to do with the quality and reliability of transit going to a place than how many traveling minutes it is. Taking a train cross region feels faster than taking a bus cross town even if they take the same time because the train is more frequent, much more reliable, and generally has a lower mental barrier and therefore lower perceived distance.

    Transfers also add significant perceived distance even if they don’t add much actual travel time, because it’s more annoying than just sitting or standing there. The timing of your next bus is also another thing that can go wrong and significantly delay your trip. I often find myself choosing physically longer routes that have fewer transfers.

    Generally, if I’m near the train system, everywhere on the current line is “close” because it’s a one seat ride away on the highest quality transit mode. If not, I’d say 10 stops in either direction on the bus lines that directly serve the street I’m on is similarly “close.” Relatively, I consider one train stop to have the same “closeness” as two or three bus stops regardless of distance, but only because I would just walk if it was one or two bus stops away.

    Also, because I’m walking for my last mile transport, everything feels significantly closer when the weather is favourable. Rain adds some distance but I live in Vancouver so I’ve mostly stopped caring. If it snows though, everything outside my house isn’t “close” anymore because of the gauntlet of death Vancouver streets turn into when snow or ice is involved.


  • I think the ethnicities behind the crime are why it’s been so widely covered and propagandized, more than the crime itself. I don’t think it would be hard to imagine how the media would have treated it had it been all white people. This would have been a one and done story at the national level, if not completely relegated to local news.

    I don’t think it’s contradictory to both condemn the crime itself while also calling out the media’s fixation on this specific crime while conveniently ignoring many others. The crime itself is fairly insignificant in the grand scheme of things, especially considering how much tax money gets embezzled by much more powerful (and white dominated) organizations and government agencies every single day, but the media fixation because they’re sCaRy FoReIgNeRs is systemic and much more of a threat.