• 0 Posts
  • 68 Comments
Joined 1 year ago
cake
Cake day: June 11th, 2023

help-circle



  • Eeeh, I still think diving into the weeds of the technical is the wrong way to approach it. Their argument is that training isn’t copyright violation, not that sufficient training dilutes the violation.

    Even if trained only on one source, it’s quite unlikely that it would generate copyright infringing output. It would be vastly less intelligible, likely to the point of overtly garbled words and sentences lacking much in the way of grammar.

    If what they’re doing is technically an infringement or how it works is entirely aside from a discussion on if it should be infringement or permitted.


  • Basing your argument around how the model or training system works doesn’t seem like the best way to frame your point to me. It invites a lot of mucking about in the details of how the systems do or don’t work, how humans learn, and what “learning” and “knowledge” actually are.

    I’m a human as far as I know, and it’s trivial for me to regurgitate my training data. I regularly say things that are either directly references to things I’ve heard, or accidentally copy them, sometimes with errors.
    Would you argue that I’m just a statistical collage of the things I’ve experienced, seen or read? My brain has as many copies of my training data in it as the AI model, namely zero, but “Captain Picard of the USS Enterprise sat down for a rousing game of chess with his friend Sherlock Holmes, and then Shakespeare came in dressed like Mickey mouse and said ‘to be or not to be, that is the question, for tis nobler in the heart’ or something”. Direct copies of someone else’s work, as well as multiple copyright infringements.
    I’m also shit at drawing with perspective. It comes across like a drunk toddler trying their hand at cubism.

    Arguing about how the model works or the deficiencies of it to justify treating it differently just invites fixing those issues and repeating the same conversation later. What if we make one that does work how humans do in your opinion? Or it properly actually extracts the information in a way that isn’t just statistically inferred patterns, whatever the distinction there is? Does that suddenly make it different?

    You don’t need to get bogged down in the muck of the technical to say that even if you conceed every technical point, we can still say that a non-sentient machine learning system can be held to different standards with regards to copyright law than a sentient person. A person gets to buy a book, read it, and then carry around that information in their head and use it however they want. Not-A-Person does not get to read a book and hold that information without consent of the author.
    Arguing why it’s bad for society for machines to mechanise the production of works inspired by others is more to the point.

    Computers think the same way boats swim. Arguing about the difference between hands and propellers misses the point that you don’t want a shrimp boat in your swimming pool. I don’t care why they’re different, or that it technically did or didn’t violate the “free swim” policy, I care that it ruins the whole thing for the people it exists for in the first place.

    I think all the AI stuff is cool, fun and interesting. I also think that letting it train on everything regardless of the creators wishes has too much opportunity to make everything garbage. Same for letting it produce content that isn’t labeled or cited.
    If they can find a way to do and use the cool stuff without making things worse, they should focus on that.


  • Yup. :/

    I looked it up and it’s not unusual for sentencing in New York to take several months, but I would have been much happier if the political realities had pushed things to move faster.

    Having read the prosecutions response to the request for delay that basically said “everything the defense said justifying a delay was wrong, here’s why a delay would actually be a good idea”, it feels hard to blame the judge too much for granting the delay.
    Even though none of the reasons seem to be based on sound legal principles and are at best based on practical considerations.







  • As written the headline is pretty bad, but it seems their argument is that they should be able to train from publicly available copywritten information, like blog posts and social media, and not from private copywritten information like movies or books.

    You can certainly argue that “downloading public copywritten information for the purposes of model training” should be treated differently from “downloading public copywritten information for the intended use of the copyright holder”, but it feels disingenuous to put this comment itself, to which someone has a copyright, into the same category as something not shared publicly like a paid article or a book.

    Personally, I think it’s a lot like search engines. If you make something public someone can analyze it, link to it, or derivative actions, but they can’t copy it and share the copy with others.


  • That is a good point.
    On the flip side, they’re not largely selling something that has any physical finiteness to it anymore, and the sales volumes have increased drastically, resulting in significantly higher profits despite a smaller inflation adjusted unit cost.

    The cost of a good decreasing as an industry matures feels right. Jello cost 23¢ a box in 1940. Adjusted for inflation it should cost $5.17 a box now, but it’s only $1.59.
    When there’s 2 games to buy, they can be justifiably more expensive than when there’s a massive surplus.
    The games are different, but it’s not like consumers can’t find a different one they’ll also enjoy if the first one they look at is too expensive.

    Inflation has made $60 less valuable, but they’re not selling to the same market that they were 30 years ago either.
    It’s hard to use inflation to justify raising prices or adding exploitative features when you’re already seeing higher inflation adjusted profits due to a larger more accessible market, lower risk due to reduced publishing overhead, and more options for consumers, which would be expected to bring prices down.


  • Only for the sake of specific-ness: Crowdstrike forced the update, not the OS. :) and yeah, that’s generally unheard of. Like so unheard of that it’s a professional recommendation reversing occurrence based purely on how they could release a product that bypassed user expectations so aggressively and without any documentation that it was happening.
    I work in the security sector with computers, and before all this I would have said “yeah, crowdstrike is a widely deployed product and if it fits your requirements it’s reasonable to use”. Now I would strongly recommend against it, not because of this incident, but because of the engineering, product and safety culture that thought it was okay to design a product this way without user controls or even documentation around any part of it. Their after incident report is horrifying in testing it communicates they weren’t doing.

    I wouldn’t advise someone to use windows for a server, but that’s a preference thing, not a “hazard” thing. If they had a working windows setup I wouldn’t even comment on it.

    What sounds like happened to Delta is that they were set-up roughly like other companies. Maybe a little loose on different setups at different airports. That’s a forgivable level of slop. Where they differed was in having a piece of software that couldn’t handle being entirely shut off, and then immediately loaded to 100% with no ease in.
    Scheduling is a type of computer problem that’s very susceptible to getting increasingly difficult the bigger the number of things being worked with. Like exponentially more difficult, but it’s actually worse than exponential.
    I know nothing about they’re system, but I can guess that it worked fine when it was running because it needed to make a small number of scheduling decisions at a time, and could look at the existing state of things as a decided “fact”. Start the system fresh, and suddenly it needs to compare the hundreds of airports, more hundred of planes and crews, and thousands of possible routes to each other and is looking at literally billions of possible schedules which it needs to sort through to pick the best ones.
    Other airlines appear to have scheduling systems that were either developed using more modern techniques that can find “good enough” very efficiently, or the application was written to fail less easily or had better hardware so it could work faster.

    For whatever reason, delta was the only one that had the key bit of software fail to come back up.

    Delta has higher costs than the other airlines because there are regulations protecting travelers and ensuring they get appropriate refunds and accomodations if their flights are cancelled. Other airlines were able to shift people around and get going again before they had to pay out too much in ticket refunds, food, or hotels.
    Delta is arguing that crowdstrike is responsible for the total cost of the incident, which would include all the refunds and hotels, since they caused it.
    Crowdstrike recently responded that they think their liability is no greater than $10mil. They seem to be taking the position that they’re only responsible for the immediate effects, so things like diverting aircraft, needing to manually poke systems and all that.

    “Yeah I t-boned you when I ran a red light, so I owe you for the damage to your car, but your car was a dangerous piece of crap so I’m not responsible for your broken legs, hospital bills or lost wages”.
    I think the judge will find that running the red light means they are responsible for the extended consequences of their actions, even if they’re vastly in excess of what anyone would have predicted up front, but that the car was pretty dangerous so it was really only a matter of time so it’s not all on them.

    If there’s one thing I’ve learned from reading about court cases, it’s that a civil suit like this will get really complicated with how they assess damages and responsibilities.

    And yeah, there’s no perfect answer for computer system stability. You can never get perfect stability, and each 9 you add to your 99.9% uptime costs more than the last one. Eventually you have teams of people whose full time job is keeping the system up for an additional second per year. And even with that, sometimes Google still goes down because it’s all a numbers game.

    I didn’t mean to ramble so long, but I have opinions and I get type-y before bed. :)



  • You are correct that Delta was an outlier, but it wasn’t with regards to the scale of the outage, it was that their scheduling software was down far longer and they handled a lot of the customer side of things significantly less well.

    Generally, your protection against operating system issues is the aforementioned restriction on changes and how they go out.
    If something is stable, you can expect it to remain stable unless something changes or random chance breaks something.
    The operational cost of running multiple operating systems in production like you describe would be high. Typically software is only written to work on one platform, and while it can be modified to work on others, it’s usually a cost with no benefit outside of a consumer environment.
    Different operating systems have different performance characteristics you need to factor in for load scaling, different security models, and different maintenance requirements.
    Often, but not always, server administrators will focus on one OS, so adding more to the mix can mean people are rusty with whichever is your backup, which can be worse than just focusing on fixing the issue with the primary.
    OS bugs are rare, and they usually manifest early or randomly. It’s why production deployments tend to use the OS as long as it’s supported: change means learning the new issues and you’ve probably already encountered all the bullshit with what you’re currently using. That’s why the Linux distros tend to have long term support versions, and windows server edition tends to just get support for a long time with terrible documentation.

    I’m a Linux guy, so defending windows feels weird, and I want to include that I don’t think anyone should use it, particularly for a server, but the professional in me acknowledges that it’s a perfectly functional hammer.

    As we’ve learned more, I’ve become more disparaging of deltas choice to not keep the scheduling system modernized in a way that could recover faster, and not investing enough in making systems homogeneous across different airports. I still think that these issues are largely independent of their actual disaster recovery or resiliency plans.
    Inevitably, the lawsuits will determine that the blame for the damage is split between the two of them. My bet is 70/30 crowdstrike/delta, since they can easily demonstrate that the issue was fundamentally caused by crowdstrike and negatively impacted other airlines and businesses in general. Some was clearly deltas fault for just failing to keep a system modernized to handle a massive shift like this, and would have been similarly disrupted by any outage with flight cancellations.


  • The current geological era will have measurable levels of radioactive isotopes different from expectations. Just like we can tell when plants started making oxygen from the Fossil record and rock chemistry, we’ll be able to tell when humans started having some physics fun time in the atmosphere.

    Other fun fact is that we’ve added a decent set of new markers for future archeologists to date things with.
    I think we’ve caused some of the carbon dating techniques to need a little * in the future, since we’ve shifted the baseline level around quite a bit.
    We also added some new radioactive isotopes to the mix, like strontium, which show up in your teeth. Not new-new, but measurably increased levels.
    We can actually use the levels in your teeth to predict your age within a year or two.

    The discovery of this is part of what motivated the partial nuclear test ban that had both the US and Soviet Union stop testing in the atmosphere.


  • Ugh, that’s shitty. Companies keep acting like they’re confused about why they can’t find anyone when everyone knows that the problem is that they just want better than disgusting benefits, mistreatment, shit pay and legal loopholes that somehow make that the workers fault.

    There’s a place for contractors in the employment landscape. A bakery doesn’t need a staff plumber. A clothing store might only need a web designer for a few months to rebuild the website.
    But a delivery company saying the people who do their deliveries, the core of their business, every day on an ongoing basis indefinitely are contractors? That’s so obviously bullshit. We need oppressively stiff penalties for shit like that to keep them from doing it, because as long as it’s cheaper to do it wrong, they have no reason to try to do it right.


  • In this case they’re employees of a “delivery service partner”.

    It’s roughly the same thing, except instead of driving a semitruck, you’re hired as a contractor to hire and manage the delivery drivers, do everything Amazon tells you, and make sure your drivers do everything Amazon tells them as well.
    That way Amazon can pressure you into abusing the driver’s and claim it wasn’t them, it’s just that they hire terrible contractors. Refuse to negotiate because they don’t work for them.

    Which, quite clearly, isn’t a thing you’re allowed to do, since even if your employees get their checks from someone else they’re still your employees since all the work they do is for you.