How ChatGPT works its magic with language
It's all thanks to the so-called Transformer Paradigm that teaches computers to understand and generate human language. It consists of two main components: Encryption And the Decryption. Both parts work together to understand and produce information. OpenAI created it by learning from terabytes of data found on the Internet.
The encoder receives a string of words as input and converts them into a form understandable to the computer. Every word of our language is translated into a point in multidimensional space. This is done using coordinates, just as we can locate a point on a map (or two-dimensional space). We call this space Embedding space. Because many words are combinations of commonly used parts of words, i.e. part of them Symbols Note that it is not necessary to present each word separately. ChatGPT has a dictionary of 50,257 characters. Initially, the model has no idea where best to place each token. As the model learns or trains, the coordinates of words are modified or transformed so that they end up next to similar words. Words that usually occur to him are placed nearby. The model uses linear algebra to process points to create a meaningful representation of the input.
Smoother flow
Manipulations are controlled by what Self-attention Mentionsed. This mechanism allows the transformer model to assign different weights to different parts of the input. In human language, this means the model can focus on specific words in a sentence to better understand the meaning. It's like trying to pay attention to the most important parts of a conversation during a crowded meeting. This is also used remaining connections. This can be compared to walking through a series of rooms and picking up a piece of information in each room. This ensures a smoother flow of information and helps the model understand complex structures in texts.
The decoder, in turn, uses the information it obtained from the encoder to generate sentences. Here, probability plays a crucial role in calculating the probability that each token from the learned dictionary will appear as the next word in the output. While training the model, words are hidden in certain texts and the decoder has to predict which word is there. Then he checks whether his predictions are correct. If it is significantly wrong, the model parameters are adjusted to improve the prediction. This is done in several iterations during training, as adjusting elsewhere can cause errors.
The trained model has billions of parameters learned to understand and answer a given input question. The most widely used grid, ChatGPT 3.5, has about 175 billion and represents each token in a 12,288-dimensional space. Each coordinate gives you information about the token, just as it describes a home based on characteristics such as location, type, and number of bedrooms. However, it is not possible to correctly specify the type of information they represent for all coordinates in a language model, but tokens with similar meaning have very similar coordinates and are close to the tokens they often co-occur with.
'take a deep breath'
The chatbot then chooses from among these adjacent tokens when formulating its answer. He doesn't always choose the answer most likely to add variety to his answers, but that could of course make him wrong. By rephrasing your question, you can often point it in the desired direction. Users have found that additions such as “Take a Deep Breath” improve answers to math questions. This can be explained by the fact that during the training he was also shown questions from mathematics forums where the answers were often preceded by this expression to prepare the reader for the difficult summit that follows.
So, next time you talk to ChatGPT, you can marvel not only at the beauty of his response, but also at the technological masterpiece happening behind your screen. Magic with language is done with mathematics!
“Travel enthusiast. Alcohol lover. Friendly entrepreneur. Coffeeaholic. Award-winning writer.”