People are talking about the new Llama 3.3 70b release, which has generally better performance than Llama 3.1 (approaching 3.1’s 405b performance): https://www.llama.com/docs/model-cards-and-prompt-formats/llama3_3

However, something to note:

Llama 3.3 70B is provided only as an instruction-tuned model; a pretrained version is not available.

Is this the end of open-weight pretrained models from Meta, or is Llama 3.3 70b instruct just a better-instruction-tuned version of a 3.1 pretrained model?

Comparing the model cards: 3.1: https://github.com/meta-llama/llama-models/blob/main/models/llama3_1/MODEL_CARD.md 3.3: https://github.com/meta-llama/llama-models/blob/main/models/llama3_3/MODEL_CARD.md

The same knowledge cutoff, same amount of training data, and same training time give me hope that it’s just a better finetune of maybe Llama 3.1 405b.

  • geoff@lemm.ee
    link
    fedilink
    English
    arrow-up
    6
    ·
    5 days ago

    This is making me realize that I don’t fully understand the relationship between “instruction-tuned” and “pre-trained”. I thought instruction tuning was a form of fine-tuning, and that fine-tuning comes after the primary training of the model.

    • hendrik@palaver.p3x.de
      link
      fedilink
      English
      arrow-up
      8
      ·
      edit-2
      5 days ago

      A base-model / pre-trained is fed with a large dataset of random text files. Books, Wikipedia etc. After that the model can autocomplete text. And it has learned language and concepts about the world. But it won’t answer your questions. It’ll refine them, or think you’re writing an email or long list of unanswered questions and write some more questions underneath, instead of engaging with you. Or think it’s writing a novel and autocomplete “…that’s what character asked while rolling their eyes.” Or something completely arbitrary like that.

      After that major first step it’ll get fine-tuned to some task. The procedure is the same, it’ll get fed different text in almost the same way. And this just continues the training. But now it’s text that tunes it to it’s role. For example be a Chatbot. It’ll get lots of text that is a question, then a special character/token and then an answer to the question. And it’ll learn to reply with an (correct) answer if you put in a question and that token. It’ll probably also be fine-tuned to write dialogue as a Chatbot. And follow instructions. (And refuse some things and speak more unbiased, be nice…)

      You can also put in domain-specific data, make it learn/focus on medicine… I think that’s also called fine-tuning. But as far as I understand teaching knowledge with arbitrary data comes before teaching/tuning it to follow instructions, or it might forget that.

      I think instruction tuning is a form of fine-tuning. It’s just called that to distinguish it from other forms of fine-tuning. But I’m not really an expert on any of this.

    • ᗪᗩᗰᑎ@sh.itjust.works
      link
      fedilink
      English
      arrow-up
      4
      arrow-down
      2
      ·
      5 days ago

      I was also not sure what this meant, so I asked Google’s Gemini, and I think this clears it up for me:


      This means that the creators of Llama 3.3 have chosen to release only the version of the model that has been fine-tuned for following instructions. They are not making the original, “pretrained” version available.

      Here’s a breakdown of why this is significant:

      • Pretrained models: These are large language models (LLMs) trained on a massive dataset of text and code. They have learned to predict the next word in a sequence, and in doing so, have developed a broad understanding of language and a wide range of general knowledge. However, they may not be very good at following instructions or performing specific tasks.
      • Instruction-tuned models: These models are further trained on a dataset of instructions and desired outputs. This fine-tuning process teaches them to follow instructions more effectively, generate more relevant and helpful responses, and perform specific tasks with greater accuracy.

      In the case of Llama 3.3 70B, you only have access to the model that has already been optimized for following instructions and engaging in dialogue. You cannot access the initial pretrained model that was used as the foundation for this instruction-tuned version.

      Possible reasons why Meta (the creators of Llama) might have made this decision:

      • Focus on specific use cases: By releasing only the instruction-tuned model, Meta might be encouraging developers to use Llama 3.3 for assistant-like chat applications and other tasks where following instructions is crucial.
      • Competitive advantage: The pretrained model might be considered more valuable intellectual property, and Meta may want to keep it private to maintain a competitive advantage.
      • Safety and responsibility: Releasing the pretrained model could potentially lead to its misuse for generating harmful or misleading content. By releasing only the instruction-tuned version, Meta might be trying to mitigate these risks.

      Ultimately, the decision to release only the instruction-tuned model reflects Meta’s strategic goals for Llama 3.3 and their approach to responsible AI development.