Large Language Models and Artificial Intelligence

Binjian Xin | 2023-03-13

Table of Contents

Overview
Engineering Implementation of Large Language Models
Prospect and Challenges

Overview

Technological Advancement

The emergence of new technologies leads to social progress.
Artificial intelligence is hailed as the electricity of the new era.
Jordan Tigani Big Data is Dead

Updating Scientific Concepts

Three areas have undergone huge, lasting, and profound changes.
The deeper the understanding of principles, the greater the impact of applications.

What is ChatGPT?

Chat Generative Pretrained Transformer

Essence: Intelligence transformed into computation
- Basic object of calculation: embedded space(embedding)
- Machine learning methods
Characteristics
- Large scale
- Single method (deep learning Transformer architecture)
- Multilingual mode
- Strong artificial intelligence (AGI?)
Open source and open
- Transferable: image, control
- Serendipity

Corpus, training samples
- until 2003 5 EB ExaByte, 2013 5EB/2 days (1EB: $10^9$ GB, 1 Zettabye: $10^{12}$ GB, billions and billions Carl Sagan)
- Model, computational power
- Insufficient training
- The scale is necessary, but it is likely not sufficient
Single method (artificial intelligence, machine learning, deep learning, large neural network model, computational model): existed in the 1990s, computational power, rise of the Internet
- The engineering implementation principle is completely clear, the results need to be interpreted and analyzed, there are disputes
- Bet, stake, confidence, courage, faith
Serendipity
- Hardware lottery GPU
  - 1990 64-node computer network, Jeff Dean, Yoshua Bengio
- Experts: Li Feifei, Hinton, Bengio, LeCunn
- The inevitable in the accidental: The biochemical origin of life, the origin of eukaryotes, the origin of language (200,000 years ago); evolution drives exponential growth
Engineering implementation understanding
- Visualization, animation (Jay Alammar, Lilian Weng, Christopher Potts)
- Peeling an onion, layer by layer

Ilya Sutskever NIPS 2015

If the dataset is large enough
And train a large neural network
You will definitely succeed!

Large Language Models

GPT Series
- GPT2 (1.5B), GPT3 (175B), InstructGPT(Alignment, RLHF)， ChatGPT(Data collection differences), GPT4(?) 👉 NanoGPT (Andrej Karpathy)
  - ChatGPT for Slack
    Neural Network as a Large Language Model

Large Language Models and Training Computational Power

Energy Density Improvement of Lithium Batteries

Improvement in Large Language Model Capabilities

Microsoft invests in OpenAI
Competition: Microsoft(Sydney), Google(LLaMDA,Bard)，Meta(Galactica,LlaMa),GPT4
Intelligence, Agency, Sentience, Conciousness, Will

ChatGPT’s False Promises

It can be used to solve problems, but its concept of language and knowledge is fundamentally flawed.

The so-called revolutionary advances in artificial intelligence are both worrying and optimistic. Optimistic because intelligence can be used to solve problems, worrying because its concept of language and knowledge is fundamentally flawed.

This machine learning method integrates these flawed concepts into our technology and products, thereby devaluing our science and ethics. The human mind is not, like ChatGPT and its ilk, a lumbering statistical engine for pattern matching, gorging on hundreds of terabytes of data and extrapolating the most likely conversational response or most probable answer to a scientific question. On the contrary, the human mind is a surprisingly efficient and even elegant system that operates with small amounts of information; it seeks not to infer brute correlations among data points but to create explanations.

Criticism: Oxford Summerfield Lab:“Like others, Chomsky pits “pattern matching” vs. “understanding”. this is a sort of neo-dualism: it diminishes computation by asserting that it lacks some intangible quality (as we might diminish other minds by assuming they lacks some intangible quality (as we might diminish other minds by assuming they lack subjectivity)

From a Buddhist perspective, dualism exaggerates “self-nature” and gets obsessed

Yoshua Bengio

ChatGPT is impressive, but scientifically just a small step, at best it is an engineering advance.

Engineering Implementation of Large Language Models

Use Cases

Neural Network as a Large Language Model

Language Encoding Models: Morphemes and n-grams (n-gram)

Computational Objects in GPT: Embedding (embedding)

Embedding (encoding of words/morphemes)
- Independent semantics, repeated in different positions in sentences/texts, reusable variables
- Corresponds to qualia (Quolia): The clustering of concepts (colors) in consciousness, language is just an interface
Mutual relationships are confirmed by calculation.
Learn through training samples, collect semantics determined by syntax

Operations of Embedding (embedding)

Data (words) themselves are structured, mutual relationships, frequency of occurrence, similarity, commutativity, position (grammar, syntax) meaning. Expressed distributively by neural networks: relationships between concepts, operations (neural impulse conduction)

There is a pattern in all things, and it’s part of our universe. It has symmetry, elegance, and grace — those qualities you find always in that which the true artist captures. You can find it in the turning of the seasons, in the way sand trails along a ridge, in the branch clusters of the creosote bush or the pattern of its leaves. We try to copy these patterns in our lives and our society, seeking the rhythms, the dances, the forms that comfort. Yet, it is possible to see peril in the finding of ultimate perfection. It is clear that the ultimate pattern contains it own fixity. In such perfection, all things move toward death. “There is in all things a pattern that is part of our universe. It has symmetry, elegance, and grace - those qualities you find always in that which the true artist captures. You can find it in the turning of the seasons, in the way sand trails along a ridge, in the branch clusters of the creosote bush or the pattern of its leaves. We try to copy these patterns in our lives and our society, seeking the rhythms, the dances, the forms that comfort. Yet, it is possible to see peril in the finding of ultimate perfection. It is clear that the ultimate pattern contains it own fixity. In such perfection, all things move toward death.” ~ Dune (1965)

Clustering of Embedding (embedding)

Embedding in Images

Image embedding encoding and decoding, obtained through DCGAN training
Interpolation of embedding parameters: continuous change of images (male -> female)
Vector operation of embedding: modification of images

GPT3 Training

GPT3 Sample Input

GPT3 Inference

GPT3 Context and Embedding

GPT3 and Transformer

GPT3 Applications

ChatGPT

GPT3.5: codex
Supervised learning, fine-tuning
Reinforcement learning (PPO) constructs a reward function
Train improved models using reinforcement learning

Emergent Behavior ([Emergence Behavior)

Application and Deployment

Prompt Engineering
LLaMA Remakes GPT (Stanford Alpaca 7B, $100)
- Train by API comparison ➡ Business model?
LLaMA (7B) Raspberry Pi port (4GB, 10sec/token)

Transformer

Structure: Less inductive bias, better generalization
- Attention: Self-attention, cross-attention, multi-head self-attention
- MLP, multilayer perceptron
- Residual structure
Requires a large number of training samples
Network scale and dataset

Disputes

ChatGPT is a Blurry JPEG of Reality

Similar to lossy compression of original images with jpeg images
Inaccurate description of reality leads to distortion and misinformation
Model hallucination problem contaminates corpus and information

GPT4 and the Uncharted Territories of Language

“They (LLM) could also create new ethical, social, and cultural challenges that require careful reflection and regulation. How we use this technology will depend on how we recognize its implications for ourselves and others.

But when we let GPT4 do this for us, are we not abdicating our intelligence? Are we not letting go of our ability to choose, to pick out, to read? Are we not becoming passive consumers of language instead of active producers?”

GPT4 Response prompted by Jeremy Howard on 2023.02.23

This technology is a form of “Artificial Intelligence”. “Intelligence” derives from inter- (“between”) and legere (“to choose, pick out, read”). To be intelligent, then, is to be able to choose between things, to pick out what matters, to read what is written. Intelligence is not just a quantity or a quality; it is an activity, a process, a practice. It is something that we do with our minds and our words.

But when we let GPT 4 do this for us, are we not abdicating our intelligence? Are we not letting go of our ability to choose, to pick out, to read? Are we not becoming passive consumers of language instead of active producers?”

Jeremy Howard 2023.02.23 GPT 4 and the Uncharted Territories of Language

“The limits of my language mean the limits of my world.” — Ludwig Wittgenstein

They could also create new ethical, social, and cultural challenges that require careful reflection and regulation. How we use this technology will depend on how we recognize its implications for ourselves and others.

But when we let GPT 4 do this for us, are we not abdicating our intelligence? Are we not letting go of our ability to choose, to pick out, to read? Are we not becoming passive consumers of language instead of active producers?

Intelligence and Coherence Issues

The Higher the Intelligence, the More Chaotic

Coherence of Neural Networks

Prospect and Challenges

Efficiency, openness, origin, effectiveness, synthesis
- Retrieval-based (search-based) natural language processing
“Last Mile” of Large Language Models
Network structure understanding
- Maintenance, efficient updates
Disadvantages
- Long paragraphs
- Long logical reasoning (chain-of-thought reasoning) 👉 Reinforcement learning?
- Contamination of natural language corpus space

Please translate the above markdown file in English to Chinese.