DialoGPT: Introduction and Implementation Guide

Ganesh Sharma
Feb 8, 2024
4 min read

Updated: Feb 12, 2024

Introduction

Chatbots are computer programs designed to simulate conversation with human users, typically through text or voice interfaces. In this tutorial, we will discuss a chatbot named DialogGPT.

DialoGPT is a cutting-edge model used for generating responses in conversations involving multiple turns. When people evaluated the responses it produced, they found them to be similar in quality to responses from humans in one-on-one conversations. To train this model, it was fed with over 147 million conversations from Reddit discussion threads. These conversations happened between 2005 and 2017.

Implementation

Want to see how a chat with a cool AI model works using Python? Check out this implementation:

Alright, before we set off, we need to make sure we have the right gear. In this case, we're going to need two key tools: AutoModelForCausalLM and AutoTokenizer from the transformers library, as well as torch.

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

We will use a powerful AI model called DialoGPT to have a conversation. We start by loading the model and its tokenizer, which helps the AI understand what we're saying.

tokenizer = AutoTokenizer.from_pretrained("microsoft/DialoGPT-medium")
model = AutoModelForCausalLM.from_pretrained("microsoft/DialoGPT-medium")

Now, we're going to chat for five lines. Each time, we'll do a few things. First, we'll take input from the user. Then, we'll encode that input using the tokenizer and convert it into a PyTorch tensor.

# Let's chat for 5 lines
for step in range(5):
    # encode the new user input, add the eos_token and return a tensor in Pytorch
    new_user_input_ids = tokenizer.encode(input(">> User:") + tokenizer.eos_token, return_tensors='pt')

We'll combine the user's input with the history of our conversation so far. This history helps the model understand the context of the conversation. If it's not the first step, we'll add the new input to the existing conversation history.

    # append the new user input tokens to the chat history

    bot_input_ids = torch.cat([chat_history_ids, new_user_input_ids], dim=-1) if step > 0 else new_user_input_ids

It's time for the model to shine! We'll use it to generate a response based on the combined input and conversation history. We set a maximum length for the response to avoid overly long replies.

    # generated a response while limiting the total chat history to 1000 tokens, 

    chat_history_ids = model.generate(bot_input_ids, max_length=1000, pad_token_id=tokenizer.eos_token_id)

Once we have the response, we'll decode it using the tokenizer to turn it back into human-readable text. Then, we'll print it out as the bot's reply.

    # pretty print last output tokens from bot

    print("DialoGPT: {}".format(tokenizer.decode(chat_history_ids[:, bot_input_ids.shape[-1]:][0], skip_special_tokens=True)))

Putting it All Together


from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

tokenizer = AutoTokenizer.from_pretrained("microsoft/DialoGPT-medium")
model = AutoModelForCausalLM.from_pretrained("microsoft/DialoGPT-medium")

# Let's chat for 5 lines
for step in range(5):
    # encode the new user input, add the eos_token and return a tensor in Pytorch
    new_user_input_ids = tokenizer.encode(input(">> User:") + tokenizer.eos_token, return_tensors='pt')

    # append the new user input tokens to the chat history
    bot_input_ids = torch.cat([chat_history_ids, new_user_input_ids], dim=-1) if step > 0 else new_user_input_ids

    # generated a response while limiting the total chat history to 1000 tokens,
    chat_history_ids = model.generate(bot_input_ids, max_length=1000, pad_token_id=tokenizer.eos_token_id)

    # pretty print last ouput tokens from bot
    print("DialoGPT: {}".format(tokenizer.decode(chat_history_ids[:, bot_input_ids.shape[-1]:][0], skip_special_tokens=True)))

Output

Advantages

Real-World Training Data: DIALOGPT has been trained on a massive dataset of real conversations from Reddit. This means it has been exposed to a wide range of topics and language styles commonly used by people online.

Easy to Deploy and Extend: DIALOGPT is open-source and straightforward to implement. Users can fine-tune it for specific tasks or datasets relatively quickly, making it adaptable to different applications.
Building Block for Applications: DIALOGPT serves as a foundational tool for developing various conversational applications and methodologies. Its flexibility allows researchers and developers to explore new possibilities in natural language processing.
Future Focus on Toxic Output: The creators acknowledge the importance of addressing toxic or harmful content generated by models like DIALOGPT. They plan to focus on improving detection and control of such output, potentially using reinforcement learning techniques.

Limitations and Risks

Model Only Release: DIALOGPT is provided as a model without the implementation of the decoder. This means users need to do additional work to make it function fully.
Offensive Output Potential: Despite efforts to remove offensive data during training, DIALOGPT still has the potential to generate offensive responses. This could be due to biases present in the original data, including historical biases regarding gender and other factors.
Propensity for Unethical or Biased Responses: Responses generated by DIALOGPT might show a tendency to agree with unethical, biased, or offensive statements. This could include expressing agreement with ideas that are harmful or discriminatory.
Lack of Human-Likeness Guarantee: While DIALOGPT is designed to mimic human conversation, it might not always generate responses that seem entirely human-like. Users might encounter responses that seem more artificial or robotic.
Known Issues in State-of-the-Art Models: DIALOGPT shares common issues with other advanced conversation models, such as generating inappropriate or disagreeable content. These are challenges researchers are actively working to address.

And there you have it! 🎉 We've reached the conclusion of our DialoGPT Introduction and Implementation Guide journey. We sincerely hope that this guide has provided you with valuable insights and practical knowledge.

Now armed with the understanding of DialoGPT, you've learned how to leverage this model to engage in dynamic conversations. By exploring its implementation using Python and the transformers library, you've gained hands-on experience in setting up and interacting with this powerful AI model.

As you venture forth, don't hesitate to experiment and integrate DialoGPT into your projects and applications.

Thank you for embarking on this journey with us! If you have any questions, feedback, or exciting experiences to share, feel free to reach out. Your continued exploration and engagement drive innovation in the field of natural language processing. Happy chatting with DialoGPT! 🤖💬