ChatGPT Is a Blurry JPEG of the Web
date
Aug 2, 2023
slug
20230802
status
Published
tags
科技
summary
写作没有什么神奇或神秘的东西,但它涉及的不仅仅是将现有文档放在不可靠的复印机上并按下打印键。
type
Post
ChatGPT是网络世界中一张模糊的JPEG图片
1. ChatGPT是网上所有文本的有损压缩
2. 警惕「美丽的模糊」
3. 「原创想法的拙劣表达」好于「清晰表达的非原创想法」
总结:
1. ChatGPT是网上所有文本的有损压缩,我们必须时刻铭记这一点,警惕把「美丽的模糊」当做准确信息,影响判断和决策
2. 在挣扎和拙劣表达中发现「原创想法」,同时提升自己的表达能力,将其打磨成玉石
训练想象力、决策和沟通能力,打造机器无法拥有的竞争力
译文:
2013 年,德国一家建筑公司的工人们注意到他们的 Xerox 复印机存在一些奇怪的问题:当他们复制一幢房子的平面图时,复制品与原件有细微但显著的差异。在原始的平面图中,每个房间都有一个矩形标识其面积:三个房间的面积分别为 14.13、21.11 和 17.42 平方米。然而,在复印件中,所有三个房间都标记为大小为 14.13 平方米。该公司联系了计算机科学家 David Kriesel 来调查这个看似难以置信的结果。他们需要一位计算机科学家,因为现代 Xerox 复印机不使用 60 年代流行的物理静电复印技术。相反,它以数字方式扫描文档,然后打印出结果图像文件。将这一事实与几乎每个数字图像文件都被压缩以节省空间联系起来,就会开始暗示解决这个谜团的方法。
压缩文件需要两个步骤:首先是编码,其中将文件转换为更紧凑的格式,然后是解码,通过该过程将其恢复。如果恢复的文件与原始文件相同,则将压缩过程描述为无损压缩:未丢失任何信息。相比之下,如果恢复的文件只是原始文件的近似值,则将压缩描述为有损压缩:已丢失一些信息,现在无法恢复。无损压缩通常用于文本文件和计算机程序,因为这些领域中即使一个字符不正确也可能具有灾难性。有损压缩通常用于照片、音频和视频等情况,其中绝对准确性并不重要。大多数情况下,如果照片、歌曲或电影不是完美重现的,我们不会注意到。只有在文件被压缩得非常紧时,我们才会注意到被称为压缩伪像的现象:最小 JPEG 和 MPEG 图像的模糊或低比特率 MP3 的低音质感。
Xerox 复印机使用一种称为 JBIG2 的有损压缩格式,专为黑白图像而设计。为了节省空间,复印机会在图像中识别外观相似的区域,并为所有这些区域存储一个副本;当解压文件时,它会重复使用该副本来重建图像。原来复印机已经判断房间面积标签足够相似,只需要存储其中的一个标签,即 14.13,然后在打印平面图时对所有三个房间重复使用它。
Xerox 复印机使用有损压缩格式而不是无损压缩格式本身并不是问题。问题在于,复印机以微妙的方式使图像退化,其中压缩伪像不是立即可识别的。如果复印机只是产生模糊的打印输出,每个人都会知道它们不是原件的准确复制。问题在于复印机产生了可读但不正确的数字;它使复印件看起来准确,但实际上不是(2014 年,Xerox 发布了一个补丁来纠正这个问题)。
我认为,这起 Xerox 复印机事件值得我们记住,因为我们考虑 OpenAI 的 ChatGPT 和其他类似程序,它们是 AI 研究人员称之为大型语言模型。复印机和大型语言模型之间的相似之处可能不会立即显现,但考虑以下情况。想象一下,您即将永远失去对互联网的访问权限。为此,您计划创建互联网上所有文本的压缩副本,以便您可以将其存储在私有服务器上。不幸的是,您的私有服务器只有所需空间的百分之一;如果您想让所有东西都适合,就不能使用无损压缩算法。相反,您编写了一个有损算法,该算法识别文本中的统计规律,并将它们存储在专用文件格式中。由于您可以投入几乎无限的计算能力来完成此任务,因此您的算法可以识别非常微妙的统计规律,从而使您实现了百分之一百的压缩比。
现在,失去互联网访问并不那么可怕;您在服务器上存储了互联网上的所有信息。唯一的问题是,由于文本已经被高度压缩,您无法通过搜索精确引用来查找信息;您永远不会得到完全匹配,因为存储的不是单词。为解决这个问题,您创建了一个接口,接受问题形式的查询,并回复传达您服务器上内容要旨的答案。
我所描述的很像ChatGPT或其他任何大型语言模型。把ChatGPT想象成网络上所有文本的模糊JPEG。它保留了网络上的大部分信息,就像JPEG保留了更高分辨率图像的大部分信息一样,但如果你要找到确切的比特序列,你找不到它。你得到的一直是一种近似值。但由于这种近似值以语法文本的形式呈现,而ChatGPT擅长创建这种文本,所以它通常是可以接受的。你仍然在看一个模糊的JPEG,但这种模糊发生的方式不会使整个图像看起来不那么清晰。
这种类比有助于理解ChatGPT在使用不同的词重组在网络上找到的信息时的便利程度。它也有助于理解大型语言模型(如ChatGPT)容易出现的“幻觉”或对事实问题的荒谬回答。这些幻觉是压缩失真的产物,但是——就像施乐复印机生成的错误标签一样——它们是足够合理的,需要将它们与原始文本进行比较,这种原始文本可以是网络上的内容,也可以是我们自己对世界的认知。当我们这样思考它们时,这样的幻觉并不令人惊讶;如果压缩算法被设计为在舍弃原始文本的百分之九十九后重构文本,我们应该预料到它生成的大部分内容是完全捏造的。
这种类比在我们记住失真压缩算法常用的插值技术时意义更加明显——即通过查看间隙两侧的内容来估计缺失的内容。当图像程序在显示照片时,必须重构在压缩过程中丢失的像素时,它会查看附近的像素并计算平均值。当ChatGPT被提示用独立宣言的风格描述在烘干机中丢失袜子时,它也是这样做的:它取两个在“词汇空间”中的点,并生成将占据它们之间位置的文本。(“当在人类活动的过程中,为了维护其干净和有序,有必要将衣物与其伴侣分开时......”)ChatGPT在这种插值形式上表现得非常好,以至于人们发现它很有趣:他们发现了一个段落的“模糊”工具,而不是照片,并且正在玩得很开心。
鉴于像ChatGPT这样的大型语言模型通常被誉为人工智能的最前沿,将它们描述为有损文本压缩算法可能听起来有些贬低或至少是泄气的。我认为这种观点提供了一个有用的纠正,以避免过度拟人化大型语言模型的倾向,但压缩类比还有另一个值得考虑的方面。自2006年以来,名为马库斯·胡特的人工智能研究人员向任何人提供现金奖励,称为压缩人类知识奖或胡特奖,只要他们能够将一个特定的一千兆字节的维基百科快照无损压缩得比上一个获奖者更小。您可能遇到过使用Zip文件格式压缩的文件。Zip格式将胡特的一千兆字节文件缩小到约三百兆字节;最近的获奖者已经将其缩小到了一百十五兆字节。这不仅仅是一个压缩的练习。胡特认为更好的文本压缩将对人类级别的人工智能的创建起到关键作用,部分原因是最大程度的压缩可以通过理解文本来实现。
为了理解压缩和理解之间的关系,想象一下您有一个包含一百万个加、减、乘和除的例子的文本文件。尽管任何压缩算法都可以缩小此文件的大小,但实现最大压缩比的方法可能是推导出算术原理,然后编写计算器程序的代码。使用计算器,您可以完美地重建不仅是文件中的一百万个示例,还包括您未来可能遇到的任何其他算术示例。相同的逻辑也适用于压缩维基百科的一部分的问题。如果压缩程序知道力等于质量乘以加速度,它可以在压缩有关物理的页面时丢弃很多单词,因为它将能够重建它们。同样,程序对供求关系的了解越多,它在压缩经济学页面时就能够丢弃更多单词,依此类推。
大型语言模型在文本中识别统计规律。对Web文本的任何分析都将显示短语“供应不足”经常出现在与“价格上涨”类似的短语附近。包含这种相关性的聊天机器人可能会在回答有关供应短缺影响的问题时,回答价格上涨的答案。如果大型语言模型编译了大量的经济术语相关性,以至于它可以对各种各样的问题提供合理的答案,我们是否应该说它实际上理解了经济学理论? ChatGPT等模型由于种种原因不符合Hutter奖的资格,其中之一是它们不能精确地重建原始文本,即它们不能执行无损压缩。但是,它们的有损压缩是否表明了A.I.研究人员感兴趣的真正理解可能性呢?
让我们回到算术的例子。如果要求GPT-3(ChatGPT构建的大型语言模型)添加或减去一对数字,当数字仅有两位数时,它几乎总是会回答正确的答案。但是,当数字增加到五位数时,它的准确性显著降低,降至百分之十。GPT-3给出的大多数正确答案都不在Web上找到-例如,没有包含“245 + 821”这样的文本的Web页面很少-因此它并没有进行简单的记忆。但是,尽管摄取了大量信息,它仍然无法推导出算术原理。对GPT-3不正确答案的仔细检查表明,它在执行算术时不能携带“1”。Web当然包含携带“1”的解释,但GPT-3无法将这些解释纳入其中。 GPT-3对算术示例的统计分析使它能够产生表面上的真实事物近似,但仅限于此。
鉴于GPT-3在小学教育中失败的事实,我们如何解释它有时表现出在写大学级论文方面表现良好的现象?尽管大型语言模型经常出现幻觉,但当它们清醒时,它们似乎真正理解经济理论等学科。也许算术是一个特例,大型语言模型并不适合这种情况。在加法和减法之外的领域,文本中的统计规律实际上确实对应于对现实世界的真正认识吗?
我认为有一个更简单的解释。想象一下如果ChatGPT是一个无损算法,它会是什么样子。如果是这样,它将始终通过提供相关网页的逐字引用来回答问题。我们可能会认为这个软件仅仅比传统搜索引擎稍微好一些,并且对它的印象会更少。事实上,ChatGPT是通过从Web上重新表述材料而不是逐字引用它,使它看起来像一个表达自己想法的学生,而不是简单地复读她所读过的内容;它创造了ChatGPT理解材料的幻觉。在人类学生中,死记硬背并不是真正学习的指标,因此ChatGPT无法从Web页面中产生精确的引用正是让我们认为它已经学到了一些东西。当我们处理单词序列时,有损压缩看起来比无损压缩更聪明。
已经提出了许多大型语言模型的用途。将它们视为模糊的JPEG图像,可以评估它们可能适合或不适合的领域。让我们考虑几种情况。
大型语言模型能否取代传统搜索引擎?为了对它们有信心,我们需要知道它们没有被灌输宣传和阴谋论,我们需要知道JPEG捕捉了网页的正确部分。但是,即使大型语言模型只包含我们想要的信息,仍然存在模糊的问题。有一种接受的模糊,即用不同的词重新陈述信息。然后有明显的虚构模糊,当我们正在寻找事实时,我们认为这是不可接受的。尚不清楚在消除不可接受的类型的同时保留可接受的类型的模糊是否在技术上可行,但我预计我们将在不久的将来找到答案。
即使可以限制大型语言模型不从事虚构,我们是否应该使用它们生成Web内容?这只有在我们的目标是重新包装已经在Web上可用的信息时才有意义。一些公司就是为此存在的,我们通常称它们为内容工厂。也许大型语言模型的模糊将对它们有用,作为避免侵犯版权的一种方式。总的来说,我认为任何对内容工厂有利的东西都不适合寻找信息的人。这种重新包装的兴起使我们难以在网上找到我们要寻找的内容;大型语言模型生成的文本在Web上发表的越多,Web就越变得模糊。
OpenAI即将推出ChatGPT的继任者GPT-4,目前关于它的信息非常少。但我要做一个预测:当组装用于训练GPT-4的大量文本时,OpenAI的工作人员将尽一切努力排除由ChatGPT或其他大型语言模型生成的材料。如果事实证明是这样的,这将作为意外证明大型语言模型和有损压缩之间的类比是有用的。重复保存JPEG会创建更多的压缩伪影,因为每次保存都会丢失更多的信息。这是旧时制作复印件的数字等效物。图像质量只会变差。
事实上,评估大型语言模型质量的一个有用标准可能是公司是否愿意使用它所生成的文本作为新模型的训练材料。如果ChatGPT的输出不够好用于GPT-4,则可以将其视为指示器,即它对我们也不够好。相反,如果模型开始生成好到可以用来训练新模型的文本,那么这应该使我们对该文本的质量有信心。(我怀疑这样的结果需要突破这些模型构建技术。)如果我们开始看到模型产生的输出与它们的输入一样好,那么有损压缩的类比将不再适用。
大型语言模型能否帮助人类创作原创作品?要回答这个问题,我们需要具体说明这个问题是什么意思。有一种被称为Xerox艺术或复印艺术的艺术流派,艺术家使用复印机的特殊属性作为创作工具。类似的东西在ChatGPT中肯定是可能的,因此在这个意义上,答案是肯定的。但我认为没有人会声称复印机已成为艺术创作的必备工具;绝大多数艺术家不会在创作过程中使用它们,也没有人认为他们因此处于劣势。
因此,让我们假设我们不是在谈论类似于Xerox艺术的新写作流派。鉴于这种规定,大型语言模型生成的文本是否可以成为作家创作原创作品的有用起点,无论是小说还是非小说?让大型语言模型处理样板文本是否能让作家将精力集中在真正创造性的部分上?
显然,没有人能代表所有作家,但让我争辩一下,从模糊的非原创作品副本开始创作原创作品并不是一个好方法。如果你是一位作家,你将在创作原创作品之前写很多非原创作品。在那些非原创作品上花费的时间和精力并不是浪费的;相反,我认为它恰恰是你最终创作出原创作品的原因。花费时间选择正确的词和重新排列句子以更好地跟随彼此是教你如何通过散文传达含义的方式。让学生写作业不仅是测试他们对材料的掌握程度的一种方式;它还使他们在表达思想方面获得经验。如果学生永远不用写我们都读过的论文,他们永远不会获得撰写我们从未读过的东西所需的技能。
而且,一旦你不再是学生,使用大型语言模型提供的模板也并不安全。表达自己的思想的斗争并没有在你毕业后消失——每次你开始起草新作品时,这种斗争可能会发生。有时候,只有在写作过程中才能发现你的原创思路。有人可能会说,大型语言模型的输出看起来与人类作家的初稿并没有太大区别,但我再次认为这只是表面上的相似之处。你的第一稿不是清晰表达的非原创思路;它是表达不好的原创思路,并且伴随着你模糊的不满,你意识到它说的与你想说的之间存在巨大差距。这就是重写过程中指导你的东西之一,这也是你从AI生成的文本开始时缺少的东西之一。
写作没有什么神奇或神秘的东西,但它涉及的不仅仅是将现有文档放在不可靠的复印机上并按下打印键。未来我们可能会构建一个AI,它能够根据自己对世界的经验写出好的散文。当我们实现这一点的那一天将是非常重要的一天——但那一天远远超出了我们的预测范围。与此同时,合理地问一下,有什么用处让某物重新表述网络?如果我们永远失去了对互联网的访问权,并且必须在有限的服务器上存储副本,像ChatGPT这样的大型语言模型可能是一个好的解决方案,假设它可以避免制造虚假信息。但我们没有失去对互联网的访问权。因此,当你仍然拥有原始文本时,一个模糊的JPEG有多大用处?
原文:
In 2013, workers at a German construction company noticed something odd about their Xerox photocopier: when they made a copy of the floor plan of a house, the copy differed from the original in a subtle but significant way. In the original floor plan, each of the house’s three rooms was accompanied by a rectangle specifying its area: the rooms were 14.13, 21.11, and 17.42 square metres, respectively. However, in the photocopy, all three rooms were labelled as being 14.13 square metres in size. The company contacted the computer scientist David Kriesel to investigate this seemingly inconceivable result. They needed a computer scientist because a modern Xerox photocopier doesn’t use the physical xerographic process popularized in the nineteen-sixties. Instead, it scans the document digitally, and then prints the resulting image file. Combine that with the fact that virtually every digital image file is compressed to save space, and a solution to the mystery begins to suggest itself.
Compressing a file requires two steps: first, the encoding, during which the file is converted into a more compact format, and then the decoding, whereby the process is reversed. If the restored file is identical to the original, then the compression process is described as lossless: no information has been discarded. By contrast, if the restored file is only an approximation of the original, the compression is described as lossy: some information has been discarded and is now unrecoverable. Lossless compression is what’s typically used for text files and computer programs, because those are domains in which even a single incorrect character has the potential to be disastrous. Lossy compression is often used for photos, audio, and video in situations in which absolute accuracy isn’t essential. Most of the time, we don’t notice if a picture, song, or movie isn’t perfectly reproduced. The loss in fidelity becomes more perceptible only as files are squeezed very tightly. In those cases, we notice what are known as compression artifacts: the fuzziness of the smallest JPEG and MPEG images, or the tinny sound of low-bit-rate MP3s.
Xerox photocopiers use a lossy compression format known as JBIG2, designed for use with black-and-white images. To save space, the copier identifies similar-looking regions in the image and stores a single copy for all of them; when the file is decompressed, it uses that copy repeatedly to reconstruct the image. It turned out that the photocopier had judged the labels specifying the area of the rooms to be similar enough that it needed to store only one of them—14.13—and it reused that one for all three rooms when printing the floor plan.
The fact that Xerox photocopiers use a lossy compression format instead of a lossless one isn’t, in itself, a problem. The problem is that the photocopiers were degrading the image in a subtle way, in which the compression artifacts weren’t immediately recognizable. If the photocopier simply produced blurry printouts, everyone would know that they weren’t accurate reproductions of the originals. What led to problems was the fact that the photocopier was producing numbers that were readable but incorrect; it made the copies seem accurate when they weren’t. (In 2014, Xerox released a patch to correct this issue.)
I think that this incident with the Xerox photocopier is worth bearing in mind today, as we consider OpenAI’s ChatGPT and other similar programs, which A.I. researchers call large language models. The resemblance between a photocopier and a large language model might not be immediately apparent—but consider the following scenario. Imagine that you’re about to lose your access to the Internet forever. In preparation, you plan to create a compressed copy of all the text on the Web, so that you can store it on a private server. Unfortunately, your private server has only one per cent of the space needed; you can’t use a lossless compression algorithm if you want everything to fit. Instead, you write a lossy algorithm that identifies statistical regularities in the text and stores them in a specialized file format. Because you have virtually unlimited computational power to throw at this task, your algorithm can identify extraordinarily nuanced statistical regularities, and this allows you to achieve the desired compression ratio of a hundred to one.
Now, losing your Internet access isn’t quite so terrible; you’ve got all the information on the Web stored on your server. The only catch is that, because the text has been so highly compressed, you can’t look for information by searching for an exact quote; you’ll never get an exact match, because the words aren’t what’s being stored. To solve this problem, you create an interface that accepts queries in the form of questions and responds with answers that convey the gist of what you have on your server.
What I’ve described sounds a lot like ChatGPT, or most any other large language model. Think of ChatGPT as a blurry JPEG of all the text on the Web. It retains much of the information on the Web, in the same way that a JPEG retains much of the information of a higher-resolution image, but, if you’re looking for an exact sequence of bits, you won’t find it; all you will ever get is an approximation. But, because the approximation is presented in the form of grammatical text, which ChatGPT excels at creating, it’s usually acceptable. You’re still looking at a blurry JPEG, but the blurriness occurs in a way that doesn’t make the picture as a whole look less sharp.
This analogy to lossy compression is not just a way to understand ChatGPT’s facility at repackaging information found on the Web by using different words. It’s also a way to understand the “hallucinations,” or nonsensical answers to factual questions, to which large language models such as ChatGPT are all too prone. These hallucinations are compression artifacts, but—like the incorrect labels generated by the Xerox photocopier—they are plausible enough that identifying them requires comparing them against the originals, which in this case means either the Web or our own knowledge of the world. When we think about them this way, such hallucinations are anything but surprising; if a compression algorithm is designed to reconstruct text after ninety-nine per cent of the original has been discarded, we should expect that significant portions of what it generates will be entirely fabricated.
This analogy makes even more sense when we remember that a common technique used by lossy compression algorithms is interpolation—that is, estimating what’s missing by looking at what’s on either side of the gap. When an image program is displaying a photo and has to reconstruct a pixel that was lost during the compression process, it looks at the nearby pixels and calculates the average. This is what ChatGPT does when it’s prompted to describe, say, losing a sock in the dryer using the style of the Declaration of Independence: it is taking two points in “lexical space” and generating the text that would occupy the location between them. (“When in the Course of human events, it becomes necessary for one to separate his garments from their mates, in order to maintain the cleanliness and order thereof. . . .”) ChatGPT is so good at this form of interpolation that people find it entertaining: they’ve discovered a “blur” tool for paragraphs instead of photos, and are having a blast playing with it.
Given that large language models like ChatGPT are often extolled as the cutting edge of artificial intelligence, it may sound dismissive—or at least deflating—to describe them as lossy text-compression algorithms. I do think that this perspective offers a useful corrective to the tendency to anthropomorphize large language models, but there is another aspect to the compression analogy that is worth considering. Since 2006, an A.I. researcher named Marcus Hutter has offered a cash reward—known as the Prize for Compressing Human Knowledge, or the Hutter Prize—to anyone who can losslessly compress a specific one-gigabyte snapshot of Wikipedia smaller than the previous prize-winner did. You have probably encountered files compressed using the zip file format. The zip format reduces Hutter’s one-gigabyte file to about three hundred megabytes; the most recent prize-winner has managed to reduce it to a hundred and fifteen megabytes. This isn’t just an exercise in smooshing. Hutter believes that better text compression will be instrumental in the creation of human-level artificial intelligence, in part because the greatest degree of compression can be achieved by understanding the text.
To grasp the proposed relationship between compression and understanding, imagine that you have a text file containing a million examples of addition, subtraction, multiplication, and division. Although any compression algorithm could reduce the size of this file, the way to achieve the greatest compression ratio would probably be to derive the principles of arithmetic and then write the code for a calculator program. Using a calculator, you could perfectly reconstruct not just the million examples in the file but any other example of arithmetic that you might encounter in the future. The same logic applies to the problem of compressing a slice of Wikipedia. If a compression program knows that force equals mass times acceleration, it can discard a lot of words when compressing the pages about physics because it will be able to reconstruct them. Likewise, the more the program knows about supply and demand, the more words it can discard when compressing the pages about economics, and so forth.
Large language models identify statistical regularities in text. Any analysis of the text of the Web will reveal that phrases like “supply is low” often appear in close proximity to phrases like “prices rise.” A chatbot that incorporates this correlation might, when asked a question about the effect of supply shortages, respond with an answer about prices increasing. If a large language model has compiled a vast number of correlations between economic terms—so many that it can offer plausible responses to a wide variety of questions—should we say that it actually understands economic theory? Models like ChatGPT aren’t eligible for the Hutter Prize for a variety of reasons, one of which is that they don’t reconstruct the original text precisely—i.e., they don’t perform lossless compression. But is it possible that their lossy compression nonetheless indicates real understanding of the sort that A.I. researchers are interested in?
Let’s go back to the example of arithmetic. If you ask GPT-3 (the large-language model that ChatGPT was built from) to add or subtract a pair of numbers, it almost always responds with the correct answer when the numbers have only two digits. But its accuracy worsens significantly with larger numbers, falling to ten per cent when the numbers have five digits. Most of the correct answers that GPT-3 gives are not found on the Web—there aren’t many Web pages that contain the text “245 + 821,” for example—so it’s not engaged in simple memorization. But, despite ingesting a vast amount of information, it hasn’t been able to derive the principles of arithmetic, either. A close examination of GPT-3’s incorrect answers suggests that it doesn’t carry the “1” when performing arithmetic. The Web certainly contains explanations of carrying the “1,” but GPT-3 isn’t able to incorporate those explanations. GPT-3’s statistical analysis of examples of arithmetic enables it to produce a superficial approximation of the real thing, but no more than that.
Given GPT-3’s failure at a subject taught in elementary school, how can we explain the fact that it sometimes appears to perform well at writing college-level essays? Even though large language models often hallucinate, when they’re lucid they sound like they actually understand subjects like economic theory. Perhaps arithmetic is a special case, one for which large language models are poorly suited. Is it possible that, in areas outside addition and subtraction, statistical regularities in text actually do correspond to genuine knowledge of the real world?
I think there’s a simpler explanation. Imagine what it would look like if ChatGPT were a lossless algorithm. If that were the case, it would always answer questions by providing a verbatim quote from a relevant Web page. We would probably regard the software as only a slight improvement over a conventional search engine, and be less impressed by it. The fact that ChatGPT rephrases material from the Web instead of quoting it word for word makes it seem like a student expressing ideas in her own words, rather than simply regurgitating what she’s read; it creates the illusion that ChatGPT understands the material. In human students, rote memorization isn’t an indicator of genuine learning, so ChatGPT’s inability to produce exact quotes from Web pages is precisely what makes us think that it has learned something. When we’re dealing with sequences of words, lossy compression looks smarter than lossless compression.
A lot of uses have been proposed for large language models. Thinking about them as blurry JPEGs offers a way to evaluate what they might or might not be well suited for. Let’s consider a few scenarios.
Can large language models take the place of traditional search engines? For us to have confidence in them, we would need to know that they haven’t been fed propaganda and conspiracy theories—we’d need to know that the JPEG is capturing the right sections of the Web. But, even if a large language model includes only the information we want, there’s still the matter of blurriness. There’s a type of blurriness that is acceptable, which is the re-stating of information in different words. Then there’s the blurriness of outright fabrication, which we consider unacceptable when we’re looking for facts. It’s not clear that it’s technically possible to retain the acceptable kind of blurriness while eliminating the unacceptable kind, but I expect that we’ll find out in the near future.
Even if it is possible to restrict large language models from engaging in fabrication, should we use them to generate Web content? This would make sense only if our goal is to repackage information that’s already available on the Web. Some companies exist to do just that—we usually call them content mills. Perhaps the blurriness of large language models will be useful to them, as a way of avoiding copyright infringement. Generally speaking, though, I’d say that anything that’s good for content mills is not good for people searching for information. The rise of this type of repackaging is what makes it harder for us to find what we’re looking for online right now; the more that text generated by large language models gets published on the Web, the more the Web becomes a blurrier version of itself.
There is very little information available about OpenAI’s forthcoming successor to ChatGPT, GPT-4. But I’m going to make a prediction: when assembling the vast amount of text used to train GPT-4, the people at OpenAI will have made every effort to exclude material generated by ChatGPT or any other large language model. If this turns out to be the case, it will serve as unintentional confirmation that the analogy between large language models and lossy compression is useful. Repeatedly resaving a JPEG creates more compression artifacts, because more information is lost every time. It’s the digital equivalent of repeatedly making photocopies of photocopies in the old days. The image quality only gets worse.
Indeed, a useful criterion for gauging a large language model’s quality might be the willingness of a company to use the text that it generates as training material for a new model. If the output of ChatGPT isn’t good enough for GPT-4, we might take that as an indicator that it’s not good enough for us, either. Conversely, if a model starts generating text so good that it can be used to train new models, then that should give us confidence in the quality of that text. (I suspect that such an outcome would require a major breakthrough in the techniques used to build these models.) If and when we start seeing models producing output that’s as good as their input, then the analogy of lossy compression will no longer be applicable.
Can large language models help humans with the creation of original writing? To answer that, we need to be specific about what we mean by that question. There is a genre of art known as Xerox art, or photocopy art, in which artists use the distinctive properties of photocopiers as creative tools. Something along those lines is surely possible with the photocopier that is ChatGPT, so, in that sense, the answer is yes. But I don’t think that anyone would claim that photocopiers have become an essential tool in the creation of art; the vast majority of artists don’t use them in their creative process, and no one argues that they’re putting themselves at a disadvantage with that choice.
So let’s assume that we’re not talking about a new genre of writing that’s analogous to Xerox art. Given that stipulation, can the text generated by large language models be a useful starting point for writers to build off when writing something original, whether it’s fiction or nonfiction? Will letting a large language model handle the boilerplate allow writers to focus their attention on the really creative parts?
Obviously, no one can speak for all writers, but let me make the argument that starting with a blurry copy of unoriginal work isn’t a good way to create original work. If you’re a writer, you will write a lot of unoriginal work before you write something original. And the time and effort expended on that unoriginal work isn’t wasted; on the contrary, I would suggest that it is precisely what enables you to eventually create something original. The hours spent choosing the right word and rearranging sentences to better follow one another are what teach you how meaning is conveyed by prose. Having students write essays isn’t merely a way to test their grasp of the material; it gives them experience in articulating their thoughts. If students never have to write essays that we have all read before, they will never gain the skills needed to write something that we have never read.
And it’s not the case that, once you have ceased to be a student, you can safely use the template that a large language model provides. The struggle to express your thoughts doesn’t disappear once you graduate—it can take place every time you start drafting a new piece. Sometimes it’s only in the process of writing that you discover your original ideas. Some might say that the output of large language models doesn’t look all that different from a human writer’s first draft, but, again, I think this is a superficial resemblance. Your first draft isn’t an unoriginal idea expressed clearly; it’s an original idea expressed poorly, and it is accompanied by your amorphous dissatisfaction, your awareness of the distance between what it says and what you want it to say. That’s what directs you during rewriting, and that’s one of the things lacking when you start with text generated by an A.I.
There’s nothing magical or mystical about writing, but it involves more than placing an existing document on an unreliable photocopier and pressing the Print button. It’s possible that, in the future, we will build an A.I. that is capable of writing good prose based on nothing but its own experience of the world. The day we achieve that will be momentous indeed—but that day lies far beyond our prediction horizon. In the meantime, it’s reasonable to ask, What use is there in having something that rephrases the Web? If we were losing our access to the Internet forever and had to store a copy on a private server with limited space, a large language model like ChatGPT might be a good solution, assuming that it could be kept from fabricating. But we aren’t losing our access to the Internet. So just how much use is a blurry JPEG, when you still have the original?