Posted on Leave a comment

What is GPT-2 and how do I install, configure and use it to take over the world?

Artificial Intelligence - AI

Please, don’t fall under the spell of OpenAPI’s nonsensical claims. Don’t buy the line that their release of GPT-2 to the public was for the benefit of mankind. OpenAPI ultimately released GPT-2 (aka Generative Pre-trained Transformer 2), the AI linguistic model they once deemed “too dangerous” for the public to use, so they could transition from a non-profit to a commercial entity and rake in the dough with GPT-3 and beyond. Still, they left us with GPT-2 which is pretty cool and easy to setup and use.

What is GPT-2?

GPT-2 or Generative Pre-trained Transformer 2, is an unsupervised transformer language model. The corpus it was trained on, called WebText, contains slightly over 8 million documents for a total of 40 GB of text from URLs shared in Reddit submissions with at least 3 upvotes. First announced in February 2019, GPT-2 was withheld out of concern over potential misuse, including applications that could generate fake news.

How to install GPT-2

We will use Anaconda as the Python environment. To begin. open Anaconda and switch to the Environments tab. Click the arrow next to an environment and open a terminal.

image 2

Enter the following to create a Anaconda Environment running GPT-2. We will create a Python 3.x environment which is what is needed to run GPT-2. We will name this environment “GPT2”

conda create -n GPT2 python=3

Activate the environment using the following command:

conda activate GPT2

The GPT-2 project is available on Git at Clone the environment to your local machine using the following command. This will create a gpt-2 directory in the current folder.

git clone

Next, install the requirements specified in the Git GPT-2 repository’s requirements.txt. Change to the gpt-2 directory (created with the clone above) and enter this command.

pip install --upgrade -r requirements.txt

Now we need to download the pre-trained model. There are various models available ranging in size. They are named 124M, 355M, 774M and 1558M. The 774M model is about 3.1 gigabytes in size and the 1558M is about 6.2 GB. A download script is included in the gpt-2 repository. Install the model of your choice using the download script with this command. Be sure you are in the gpt-2 directory when executing the command.

python 774M
image 1

Install remaining modules that are likely missing from the base Python install. Some of these were installed during the requirements.txt install but often that step does not work.

conda install numpy

conda install tensorflow==1.15.0

conda install tensorflow-gpu==2.0.0

conda install regex==2017.4.5

conda install fire>=0.1.3

conda install requests==2.21.0

conda install tqdm==4.31.1

conda install _regex

Select the GPT2 environment in Anaconda and install Spyder, the Python IDE, in the environment. Open Spyder and create a new project in the existing GPT2 directory (Projects > New Project). You should now see the GPT2 directories in your IDE.

image 3

We need to change one parameter before running the model. Open the file in the /src directory. Change the model_name to the name of the model you downloaded in the previous step (774M in the example below).

image 4

How to run GPT-2 models

You can now run the script by right-clicking and choosing Run. If the script errors with missing modules, install them withing Spyder using conda.

conda install numpy

How to run GPT-2 models

Run the model and type in some text. You may have to wait a while for the response. Here’s an example.

The phrase entered:

“Donald Trump should never have been elected president. He is a liar and a narcissist and an embarrassment to the United States.”

And here’s the work of art the AI created.

He also happens to be married to a woman who is a homophobe, banned from the Vatican and for which he proudly boasts.

To his credit, Mr. Trump has found a stroke of genius in his narrative that the media is refusing to critique him because it lacks an agenda: no agenda allowed. Any other time, in any other country, the media would have been out front, dissecting Mr. Trump as it should be doing just that. But we have a party in the United States that forbids dissent except with Islamic references, like Megyn Kelly, Mrs. Clinton, Kanye West, Joan Rivers and everyone on Broadway at the time was unable to eject card-carrying members of Monotheism.

And you know what? Directing that criticism at Hillary Clinton is a big mistake, because your Democratic opponents are on the table. The GOP front-runner, who is actively attacking the press, is Jeff Sessions. Maybe Sessions might actually be able to achieve his hopes for bigger federal statues of him than Robert E. Lee, but then Trump would be shrinking the map and the moral authority of America, and drawing less persecution from the genocidal 8th Cavalry of Free Truth.

Hillary Clinton is also CEO-like, despite her record including pay transparency and ethics rules and her own controversies. This can be a big problem for various people because there is no gap between leaders and employees in any company. If the gaps between people who distract you from destined excellence and people promoting themselves or left in charge are too large, even if that leadership is awesome, they make it difficult to stay awesome and to establish a recognizable identity. And then you become Jackass-like like, looking for any opportunity to disagree with anyone.

One can remain in the party, find a middle ground, or take on the loudest voices. But it can’t have some former president ( who Trump says criticizes him because he is a disgrace to his party) whispering in your ear, then announce that President Traitor Anthony Weiner hid millions of emails from local cops knowing that Hillary chose to use the exact same setup, the Party didn’t notice and this is how the emails got out soon after.

Trump too has expended energy and energy resources on helping your causes, seeming distrustingly at you when you ask him to be quick to accept qualification because he just doesn’t like what you say. You are insane: he chooses his words carefully and clearly.

Not quite powerful enough to take over the world (yet) but still pretty cool.

How to retrain GPT-2

You can retrain GPT-2 with any text data – poems, novels, song lyrics, questions and answers, scientific papers, news, or articles. Take each source and compile into a single file using the following line to delimit each source.


The delimiter is not required for training but helps the model learn the formatting of the training data.

Move your training data into the src directory of your GPT-2 project.

Copy the script to the src directory.

Encode the training data like this. training_data.txt is the input training data and training_data_encoded.npz is the encoded dataset.

python training_data.txt training_data_encoded.npz

The model can be trained using this command.

python --dataset lyric.npz

Every 100 steps the script will output 3 samples. The loss rate will be displayed and should decrease over time. If the loss does not decrease, the model is not learning anything. To correct this, reduce the learning rate using the –learning-_rate parm.

python --dataset training_data_encoded.npz --batch_size 2 --learning_rate 0.0001

In the example above, we also increased the batch_size from 1 to 2 which should help speed things up (assuming you have enough RAM to handle the increased batch size).

To stop training, press Ctrl + C. The model automatically saves its progress every 1,000 steps.

You can resume training using the following:

python --dataset training_data_encoded.npz

The output can be deciphered as follows. Output example: [340 | 75.38] loss=0.66 avg=0.66:

340: Refers to the number of training step. Think of it as a counter that will increase by 1 after each run.

75.38: Time elapsed since the start of training in seconds. You can use the first step as reference to determine how long does it take to run one step.

loss and avg: Both of them refers to the cross-entropy (log loss) and the average loss. You can use this to determine the performance of your model. In theory, as training steps increases, the loss should decrease until it converge at certain value. The lower, the better.

How to run GPT-2 with new, retrained models

After you have stopped training using Ctrl + C, a checkpoint directory will be created in the src directory. In the src/models directory, create a new folder with a name like mydata.

Go to the src/checkpoint/run1 folder and copy the following files to the new directory. “xxx” is the step number that you stopped the training on.

  • checkpoint
  • model-xxx.index
  • model-xxx.meta

Next go to one of the original model folders (e.g. 774M) and copy the following files into your new directory:

  • encoder.json
  • hparams.json
  • vocab.bpe

You should have 7 files in your new folder.

Edit the to change the model name to the name of your new directory.

Run the new model.