Lemmy
  • Communities
  • Create Post
  • Create Community
  • heart
    Support Lemmy
  • search
    Search
  • Login
  • Sign Up
lledrtx@lemmy.world to Memes@lemmy.ml · 2 years ago

Glad this is becoming a meme

fedia.io

message-square
61
link
fedilink
1.24K

Glad this is becoming a meme

fedia.io

lledrtx@lemmy.world to Memes@lemmy.ml · 2 years ago
message-square
61
link
fedilink
  • dislocate_expansion@reddthat.comB
    link
    fedilink
    arrow-up
    22
    ·
    2 years ago

    Anyone know why most are a 2021 internet data cut off?

    • Natanael@slrpnk.net
      link
      fedilink
      arrow-up
      19
      ·
      2 years ago

      Training from scratch and retraining is expensive. Also, they want to avoid training on ML outputs as samples, they want primarily human made works as samples, and after the initial public release of LLMs it has become harder to create large datasets without ML stuff in them

      • Scrubbles@poptalk.scrubbles.tech
        link
        fedilink
        English
        arrow-up
        13
        ·
        edit-2
        2 years ago

        There was a good paper that came out recently saying that training on ml data will result in a collapse of cohesion. It’s going to be real interesting, I don’t know if they’ll be able to train as easily ever again

      • Iron Lynx@lemmy.world
        link
        fedilink
        arrow-up
        4
        ·
        2 years ago

        I recall spotting a few things about Image Generators having their training data contaminated using generated images, and the output becoming significantly worse. So yeah, I guess LLMs and IGA’s need natural sources, or it gets more inbred than the Habsburgs.

      • TurtleJoe@lemmy.world
        link
        fedilink
        arrow-up
        3
        arrow-down
        3
        ·
        2 years ago

        I think it’s telling that they acknowledge that the stuff their bots churn out is often such garbage that training their bots on it would ruin them.

    • Donkter@lemmy.world
      link
      fedilink
      arrow-up
      7
      ·
      2 years ago

      I think it’s just that most are based on chatgpt which cuts off at 2021.

    • trashcan@sh.itjust.works
      link
      fedilink
      arrow-up
      3
      ·
      2 years ago

      Hey, did you know your profile is set to appear as a bot and as a result many may be filtering your posts and comments? You can change this in your Lemmy settings.

      Unless you are a bot… In which case where did you get your data?

      • dislocate_expansion@reddthat.comB
        link
        fedilink
        arrow-up
        4
        ·
        2 years ago

        The data wasn’t stolen, I can at least assure you of that

        • trashcan@sh.itjust.works
          link
          fedilink
          arrow-up
          1
          ·
          2 years ago

          You paid Hoffman?

    • potustheplant@feddit.nl
      link
      fedilink
      arrow-up
      2
      arrow-down
      1
      ·
      2 years ago

      Where do you get that from? At least ChatGPT isn’t limited to data from 2021. I haven’t researched about other models.

      • RatBin@lemmy.world
        link
        fedilink
        arrow-up
        8
        ·
        edit-2
        2 years ago

        deleted by creator

        • dislocate_expansion@reddthat.comB
          link
          fedilink
          arrow-up
          4
          ·
          2 years ago

          Are you sure those aren’t trained until 2021, frozen, and then fine tuned on later data?

          • RatBin@lemmy.world
            link
            fedilink
            arrow-up
            2
            ·
            edit-2
            2 years ago

            deleted by creator

      • dislocate_expansion@reddthat.comB
        link
        fedilink
        arrow-up
        4
        ·
        2 years ago

        Yeah GPT 3.5 and some other FOSS models also say 2021

        • potustheplant@feddit.nl
          link
          fedilink
          arrow-up
          5
          ·
          2 years ago

          OpenAI stated in a tweet a few months ago that the limitation is no longer in place.

          • webghost0101@sopuli.xyz
            link
            fedilink
            arrow-up
            4
            ·
            2 years ago

            To be fair this tweet doesn’t say anything about training data but simply that it theoretically can use present day data if it looks it up online.

            For gpt4 i think its was initially trained up to 2021 but it has gotten updates where data up to december 2023 was used in training. It “knows” this data and does not need to look ut up.

            Whether they managed to further train the initial gpt4 model to do so or added something they trained separately is probably a trade secret.

          • dislocate_expansion@reddthat.comB
            link
            fedilink
            arrow-up
            2
            ·
            2 years ago

            Thanks!

Memes@lemmy.ml

memes@lemmy.ml

Subscribe from Remote Instance

Create a post
You are not logged in. However you can subscribe from another Fediverse account, for example Lemmy or Mastodon. To do this, paste the following into the search field of your instance: !memes@lemmy.ml

Rules:

  1. Be civil and nice.
  2. Try not to excessively repost, as a rule of thumb, wait at least 2 months to do it if you have to.
Visibility: Public
globe

This community can be federated to other instances and be posted/commented in by their users.

  • 1.48K users / day
  • 3.04K users / week
  • 7.67K users / month
  • 16.6K users / 6 months
  • 1 local subscriber
  • 55.1K subscribers
  • 14.3K Posts
  • 227K Comments
  • Modlog
  • mods:
  • ghost_laptop@lemmy.ml
  • Cyclohexane@lemmy.ml
  • Arthur Besse@lemmy.ml
  • UI: unknown version
  • BE: 0.19.13
  • Modlog
  • Instances
  • Docs
  • Code
  • join-lemmy.org