On the internet, nobody knows you are Australian.

also https://lemm.ee/u/MargotRobbie

To tell you the truth, I don’t know who I am either. Somebody sincere, perhaps.

But if you ever read this one day, I hope that you are as proud of me, as I am of the person I imagined you to be.

  • 2 Posts
  • 14 Comments
Joined 1 year ago
cake
Cake day: June 17th, 2023

help-circle

  • Reddit, and by extension, Lemmy, offers the ideal format for LLM datasets: human generated conversational comments, which, unlike traditional forums, are organized in a branched nested format and scored with votes in the same way that LLM reward models are built.

    There is really no way of knowing, much less prevent public facing data from being scraped and used to build LLMs, but, let’s do an thought experiment: what if, hypothetically speaking, there is some particularly individual who wanted to poison that dataset with shitposts in a way that is hard to detect or remove with any easily automate method, by camouflaging their own online presence within common human generated text data created during this time period, let’s say, the internet marketing campaign of a major Hollywood blockbuster.

    Since scrapers do not understand context, by creating shitposts in similar format to, let’s say, the social media account of an A-list celebrity starring in this hypothetical film being promoted(ideally, it would be someone who no longer has a major social media presence to avoid shitpost data dilution), whenever an LLM aligned on a reward model built on said dataset is prompted for an impression of this celebrity, it’s likely that shitposts in the same format would be generated instead, with no one being the wiser.

    That would be pretty funny.

    Again, this is entirely hypothetical, of course.



  • The precedent in this case already exists in Midler v. Ford Motor Co., in which when Academy Award nominated actress and singer Bette Midler sued Ford after Ford hired musical impersonators to sing famous songs for their commercials.

    The court ultimately ruled in favor of Midler, because it was found that Ford gave clear instructions to the impersonating actress to sound as much like Midler as possible, and the ruling was voices, although not copyrightable, still constitutes their distinct identity and is protected against unauthorized use without permission. (Outside of satire, of course, since I doubt someone like Trump would be above suing people for making fun of him.)

    I think Scarlett Johansson has a case here, but it really hinges on whether or not OpenAI actively gave the instruction specifically to impersonate Scarlett’s voice in “Her”, or if they used her voice inside the training data at all, since there is a difference in the “Sky” voice and the voice of Scarlett Johansson.

    But then again, what do I know, I’m just here to shitpost and promote “Barbie”.






  • There is an interesting, and almost universal phenomenon on reddit that every time a subreddit gets past about 40,000 subscribers, the discussion quality immediately drops off a cliff, unless extremely harsh moderation policies are implemented to explicitly weed out low effort content which brings its own set of problems.

    My theory on why this occurs is the scaling power of moderation. I think you computer people are probably very familiar with the concept of scalability, and that size is its own challenge at the hyperscale. So for a centralized system like Twitter or Instagram or Facebook, moderation can only scale vertically, so a huge moderation team is needed to contend with the scale of these platforms alone, which also forces the need of personalized recommendation algorithms to promote this that are actually interesting to individual users.

    Reddit was able to partially avoid this phenomenon with the subreddit system, which means everyone was able to effectively manage their own, smaller subgroups who shares common interest without intervention from the site admin/mods to achieve a form of pseudo-horizontal scaling. You can also see the success of that with Facebook Groups, which are one of the few reasons why people still use Facebook for social media even though they do not want to interact with the current Facebook audience.

    Lemmy, and the rest of the fediverse platforms would suffer the problems even less, as now every group admin can now be completely independent from one another, which means that real horizontal scaling can be achieved and hopefully preserving the discussion quality to a degree as it grows.



  • Doesn’t really matter if they open sourced, since many reddit alternative over the years have been open source: Voat, Ruqqus, Raddle, doesn’t really make.a difference since they all failed one way or another. They either never hit that critical self sustaining mass of users, or they attracted the exact wrong type of users who drove out any reasonable users there.

    Federation seems to be the only way to create that critical mass of users, and Lemmy is the only alternative that really succeeded (kbin is kinda…hanging on for dear life for various reasons but is alive only due to federation) precisely because it is not a website, but a platform inside of a greater ecosystem.

    All Discuit really have is a pretty UI, as it is nowhere even near feature parity with a current defederated Lemmy instance, and Lemmy also has like a dozen different desktop and mobile UIs already.