59

When did people start referring to an ordered group of characters as a "string"? Did this name come from before / outside of the computing field, or is it special to computing?

The metaphor is clear enough, I suppose: the characters are "strung together" like beads in a necklace. This comparison could describe anything in computing that takes place in sequence, which is a lot of things. Why did it get applied to sequences of keyboard characters in particular?

Does the name come from typesetting or something? I wonder why we didn't end up with a term like "text" or "phrase" that's more familiar to outsiders than "string" is.

user3840170
  • 23,072
  • 4
  • 91
  • 150
John Skiles Skinner
  • 1,241
  • 4
  • 13
  • 3
    It has to be older than computer terms being popular, since I remember the term “string a bunch of letters together” from the 1970s before i knew anything about computers. And when I did take my first BASIC class (in 1980) the term was completely recognizable. – RonJohn Jan 25 '23 at 19:44
  • 17
    The earliest paper I’ve found in the ACM Digital Library referring to character strings was published in 1958 and doesn’t bother introducing the terminology, which suggests it was already in use by then. – Stephen Kitt Jan 25 '23 at 20:02
  • 3
    https://stackoverflow.com/questions/880195/the-history-behind-the-definition-of-a-string has some guesses… – Jon Custer Jan 25 '23 at 20:08
  • 3
    And https://softwareengineering.stackexchange.com/questions/43329/etymology-of-string – Jon Custer Jan 25 '23 at 20:09
  • 1
    Lots of things come in strings - flags, christmas lights, hit records, .... – dave Jan 26 '23 at 01:29
  • 1
    If we called strings phrases, what would we call phrases? – dave Jan 26 '23 at 01:31
  • Conventionally, also a string of racehorses. Though I would be astonished to find that's the origin! – Toby Speight Jan 26 '23 at 05:26
  • 1
    It is a little-known fact that prior to the invention of the term 'string' in the computing domain, the French were forced to refer to collections of tangy root vegetables as "char[]'s of onions". – Eight-Bit Guru Jan 27 '23 at 04:30
  • 1
    "string" is a synonym of "sequence" in Mathematics. "String" in computing is a condensed form of "character string"; there can be non-character strings, but the dominance of "string" denoting text has mostly deprecated usage like "string of integers". However, we have "bit string", "byte string" and "binary string". – Kaz Jan 27 '23 at 20:26

7 Answers7

32

The oldest occurrence I know is from 1918, so much older than the existing answers (at least for its use in mathematics and logic/computation). This is from the book:

C. I. Lewis. A survey of symbolic logic. Berkeley University of California Press, 1918.

For example, on p. 355, he writes (emphasis mine):

A mathematical system is any set of strings of recognisable marks in which some of the strings are taken initially and the remainder derived from these by operations performed according to rules which are independent of any meaning assigned to the marks. That a system should consist of ‘marks’ instead of sounds or odours is immaterial.

Barmar
  • 1,920
  • 13
  • 18
  • A great find from a fascinating book! Is the author quoting Whitehead and Russell? I can't quite tell if it's intended as a quotation – John Skiles Skinner Jan 26 '23 at 15:57
  • 1
    @JohnSkilesSkinner No, I am quoting Lewis. He isn’t quoting anyone. – Carl-Fredrik Nyberg Brodda Jan 26 '23 at 16:02
  • 6
    @JohnSkilesSkinner The oldest occurrence in a "computational" context I was able to find is from 1878, https://www.google.com/books/edition/The_Intellectual_repository_for_the_New/wUMEAAAAQAAJ?hl=en&gbpv=1&dq=%22string%20of%20digits%22&pg=PA486&printsec=frontcover referring to a number so large that is perceived just as a "string of digits" rather than a number with an understandable magnitude. – Leo B. Jan 26 '23 at 18:26
  • @LeoB. Your example of "string of digits" falls out of the category of abstract mathematical concept (there is nothing abstract about it) and into the category of "a number of objects arranged in a line". This sense was, as another answer notes, already recorded centuries ago. So your example is neither the first example of the use of "string" as an abstract mathematical concept nor as a term in English. – Carl-Fredrik Nyberg Brodda Jan 26 '23 at 21:29
  • 4
    @Carl-FredrikNybergBrodda The question as formulated does not ask about the first example of the use of "string" as an abstract mathematical concept. You've invented that aspect yourself, and now you're forcing it on others. Don't do that. – Leo B. Jan 27 '23 at 09:12
  • @LeoB My point is that, as stated, the question has an answer dating back to the 15th century, so your use from 1878 is very modern. The more interesting question (as confirmed by the OP, I have no idea what you mean by forcing) is its use as an abstract term in mathematics. – Carl-Fredrik Nyberg Brodda Jan 27 '23 at 10:04
  • @Carl-FredrikNybergBrodda A string of digits (e.g. decimal) is absolutely a mathematical object. The digits are the coefficients of a polynomial which evaluates to the digits value, when the base is substituted into the polynomial. For instance given p(x) = 1x³ + 2x² + 3x + 4, p(10) evaluates to 1234. We could express the polynomial as a vector dot product [1 2 3 4]・ [x³ x² x 1]. The vector on the left is our string of digits. – Kaz Jan 27 '23 at 20:32
  • I have marked this answer as correct though it is a little brief. It seems clear based on multiple sources that "string" comes from math, logic and formal systems. This answer is quite an early illustration. I have updated Wikipedia with the C. I. Lewis quotation. – John Skiles Skinner Jan 29 '23 at 21:42
28

This question was asked on Stack Overflow, but closed as off-topic there. Before it was closed, it received this answer (lightly edited by me):

I had guessed that "string" was in use by mathematicians long before its adoption in programming languages. Turing machines effectively operate on strings. Turing may not have used the term, but it is used everywhere in automata textbooks, going back decades.

The earliest reference I could find was a fragment in Google books of a 1944 article "Recursively enumerable sets of positive integers and their decision problems" by logician Emil Post in Bulletin of the AMS. I think there is little doubt that he is using "string" in the conventional sense used in computer science. Page 286 contains:

For working purposes, we introduce the letter b, and consider "strings" of 1's and b's such as 11b1bb1. An operation on such strings such as "b1bP produces P1bb1" we term a normal operation. This particular normal operation is applicable only to strings starting with b1b, and the derived string is then obtained from the given string by first removing the initial b1b, and then tacking on 1bb1 at the end. Thus b1bb becomes b1bb1.

Paul Callahan

Glorfindel
  • 407
  • 1
  • 3
  • 16
Toby Speight
  • 1,611
  • 14
  • 31
14

In computer science, it is sometimes deemed necessary to distinguish between the data and its representation, to be able to formulate thoughts like "lines of text are stored in computer memory as [explicit] strings of characters" (as opposed, for example, their offsets in a file, etc).

The word "string" comes naturally for that synonymic, but not quite equal, usage, as, for the English noun "string", Sense of "a number of objects arranged in a line" first recorded late 15c. (Emphasis mine - L.)

Explicit occurrences of the phrase "string of characters" in the 19th century books can be found in abundance; for example,

This literary trifling is obviously quite useless as a means of indexing for reference , unless the whole string of characters be learnt by rote

is quoted from Notes and Queries on China and Japan - Volumes 1-2 - Page 74, 1867.

Leo B.
  • 19,082
  • 5
  • 49
  • 141
  • Do you have a citation for its first use in a computing context? Not a problem if you don't - this answer is already useful to demonstrate that "string of characters" wasn't too uncommon prior to electronic computing. – Toby Speight Jan 26 '23 at 05:28
  • @TobySpeight My attempts to use Google Books for that failed, because, for example, a COBOL manual published after 1960 was found using the search for "string of characters" in publications between 1920 and 1950, and attributed to 1934. Reverifying every found item would be tedious. However, it would be a fair assumption that cryptographic contexts would precede computing contexts by several decades. – Leo B. Jan 26 '23 at 06:30
  • 3
    @TobySpeight Re cryptographic contexts: https://www.google.com/books/edition/Everybody_s/_79mdkKZSNEC?hl=en&gbpv=1&dq=%22string+of+characters%22&pg=RA1-PA45&printsec=frontcover is indeed a 1925 publication – Leo B. Jan 26 '23 at 06:42
  • A worthy mention might be the idiom "(he came out with...) a string of expletives". String is not a very common word, but it's quite obvious that it has a broader meaning as a "series" or "sequence". – Steve Jan 26 '23 at 10:16
  • ‘String’ is a relatively common word, though not necessarily in the meaning of ‘sequence’. – user3840170 Jan 26 '23 at 11:54
  • Hard to see how, "arranged in a line" leads to "string." I'd guess it more likely that somebody said, "arranged like beads on a string." One pertinent fact about a string of beads is, you can't change the order of the beads without breaking the string. – Solomon Slow Jan 26 '23 at 12:58
  • @user3840170, yes certainly in general "string" is a common word. I meant there is an established, if less common, usage in those senses I mentioned. Another linguistic-related idiom is "he couldn't string a sentence together". – Steve Jan 26 '23 at 13:31
11

The word is used in an 1834 treatise on the potential power of Charles Babbage's Difference Engine.

Mr Babbage's invention puts an engine in the place of the computer; the question is set to the instrument, or the instrument is set to the question, and by simply giving it motion the solution is wrought, and a string of answers is exhibited.

While this may not be quite the answer asked for, its usage in a related sense at the beginning of computing history could mean that looking for a strictly constrained definition of a term that conveys much of its original non-computing meaning in the contemporary syntactical usage may be an interesting but ultimately indeterminate exercise.

Stacker Lee
  • 301
  • 1
  • 5
  • 2
    I too doubt it's the answer asked for, but upvoted because it is certainly an interesting exhibit! – davidbak Jan 26 '23 at 22:59
9

Knuth frequently gives a complete definition of a word, including its etymology, derivations, derivatives, names of people who invented it, back to Babylonian times, or further, but in this case he disappoints. At least in v1 of TAOCP "Fundamental Algorithms" where "string" appears several times in the index but none of the references are to a history.

But consider the very first use of "string" (according to the index) in TAOCP, v1 3e §1.1 pg 8:

If we wish to restrict the notion of algorithm so that only elementary operations are involved, we can place restrictions on , , Ω, and , for example as follows: Let be a finite set of letters, and let be the set of all strings on (the set of all ordered sequences 12 … , where  ≥ 0 and is in for 1 ≤  ≤ ). The idea is to encode the states of the computation so that they are represented by strings of .

The context here is that of an algorithm on sequences, finite and infinite, drawn from a finite set of letters and "a set of all strings on (the set of all ordered sequences ...". (emphasis is mine)

So that tells me that in his mind the word "string" comes from formal language theory, so I suspect the answer lies there. That theory uses common words as formal terms, such as "word" and "alphabet" and "sentence". The study of formal languages goes way back so it might even be the case that the word "string" comes from a translation of a word used by some investigator writing in some other language. And of course from a computer programming point-of-view, formal languages are the basis for a lot of the seminal work in parsing and compiling, and those were very early concerns in the history of programming.

(Sadly I am not a research librarian so I cannot complete this "answer" with the actual correct facts. But maybe someone else can "do a Knuth" here. But see also Carl-Fredrik Nyberg Brodda's answer that goes back to a possible different mathematical progenitor.)

I would have to say that the otherwise completely excellent and enjoyable book Jewels of Stringology: Text Algorithms (Crochemore, Rytter, 2003) disappointed me here, too. It just asserts in the Preface:

The term stringology is a popular nickname for string algorithms as well as text algorithms. Usually text and string have the same meaning. More formally, a text is a sequence of symbols.

And then it goes on from there for 280 very interesting pages covering several dozen algorithms ...

(I just mentioned that book because I love the term "stringology".)

update

Further to the idea this comes from formal language theory by way of interest in parsing and compilers, consider the paper The Syntax And Semantics Of The Proposed National Algebraic Language Of The Zurich ACM-GAMM Conference (Backus, 1959). That language being discussed of course is what became ALGOL 60.

On page 16 we see a discussion using "strings of symbols" - that is, as used in formal language theory:

enter image description here

And then later, pg17, we see "string" used in an informal description, in the grammar itself, of what we now call a "string":

enter image description here

and a few lines later as the name of a non-terminal in the very familiar usage "quoted string":

enter image description here

(I just love these old papers where it was the responsibility, of the department secretary, to laboriously type the paper from a manuscript making care to leave extra blank space everywhere she (inevitably, then, a "she") would have to go back later and ink in the math symbol by hand. Actually, I grew up in that era, and saw it happen in our Mathematics department at college. Those ladies were skilled. It was non-trivial to read the manuscripts of our college professors!)

davidbak
  • 6,269
  • 1
  • 28
  • 34
  • 2
    Interesting! Does Knuth directly state it comes from formal language theory? Or, does he simply use it in that context without explicitly stating it? What page of TAOCP? – John Skiles Skinner Jan 26 '23 at 22:46
  • 1
    @JohnSkilesSkinner - edited answer to include photo of book - but answer is no, he doesn't state that, he just uses it immediately (pg8!) in that context. – davidbak Jan 26 '23 at 22:55
8

'String' is - AFAIK - simply short for 'String of Character'. Same way shortened as we say 'float' instead of 'floatingpoint'

As such it is a common language picture, independent and way older than (modern) computers. Works the same way with string of pearls, string of turtles or string of stars.


Common Language vs. Expert Jargon

Words of a specialized jargon almost always grew out of common words. Much like here. Later on users of that jargon do at foremost think of that specialized meaning - unless forced by environment or additional information to consider different.

Just talk with an architect about a string without giving further clues.Both of you will think they understand what it's about until the astonishing moment when it's becoming obvious that he is talking about a line of bricks, while you were quite clear that character are mean - and vice versa.

Jargon is most of the time created from common words. After all, the ones needed aren't existing at that point in time. So creation is usually done by description ... like 'a String of Characters'. Later such descriptive terms get often shortened in a way still understandable within that trade ('String of Characters' -> ' Character String' -> 'String').

This(re-)use of common words is by no way exclusive to English and becomes obvious when looking at the same term in other languages. German for example uses 'Zeichenkette' or 'Chain of Characters' (literal 'Character Chain' *1, *2) which uses the same picture, albeit based on what is the common picture in that language, so here chain instead of string.

It's the natural thing to happen - just think what words you would use to explain a data base key entity relation to your grand grand aunt - in her understanding a key is of metal and you're her relation :))

But That Must be More Complex

A habit often seen when asking such questions - or discussing it, as seen in comments - is that people are firm in today's understanding of such expert jargon that they have a hard time to think of a world where that jargon was not settled. One where every day words were used to describe the new thing. For sure there must have been as secret meeting defining that new word for very logical reasons - or at least qua law like order. It can't be that it just grew. Never :))


*1 - Interestingly as well in French, with 'chaîne de caractères'

*2 - Which BTW opens a related, quite interesting historical distinction: In cryprographical analysis before computers the German term for what we now call a string wasn't Zeichenkette, but Zeichenfolge - literal Character Sequence. So there is a settled term that could have been used, and for some time was used, but vanished.

Raffzahn
  • 222,541
  • 22
  • 631
  • 918
  • Would be interesting to learn why this answer would be considered not useful. – Raffzahn Jan 26 '23 at 01:42
  • 4
    When you consistently set a high bar people come to expect you to meet it. – davidbak Jan 26 '23 at 04:11
  • 1
    @davidbak you you're saying I'm coursed because the reason behind is too mundane? Interesting. :)) – Raffzahn Jan 26 '23 at 04:23
  • 11
    Because it’s a guess/assumption more suited to comment than answer. – RonJohn Jan 26 '23 at 04:43
  • 1
    @RonJohn Would you also call it a guess to note that a bird is called a bird? String is the literal meaning of items in a sequence. – Raffzahn Jan 26 '23 at 05:36
  • 4
    @Raffzahn But why not "array"? Or "sequence"? "Line"? "Sentence"? – wizzwizz4 Jan 26 '23 at 08:23
  • 1
    Also note, other answers provide quotations about strings of other things besides characters. – Solomon Slow Jan 26 '23 at 13:02
  • @wizzwizz4, I'd suggest one answer is that all those words already have other meanings in computing and/or writing. An array of characters is such a common thing, and so often handled and manipulated as a single unit, that people would tend to want a distinguishing word for it. "String" is monosyllabic and relatively unencumbered with other contextually-feasible meanings. And as I mention in a different comment, it did have a known use as meaning some sort of "series of words or utterances", not necessarily implying "sentences" or other well-formedness. – Steve Jan 26 '23 at 13:50
  • 6
    @Raffzahn The answer is dismissive, and is basically repeating information which is already in the question. -- It also completely omits answering any of the explicitly articulated questions: When did this usage start? Did the usage with ordered groups of characters originate outside of computing? Why is "string" specific to characters (e.g. we say an "array" of ints, not a "string" of ints)? Were there particular reasons why other potential terms (like "text" or "phrase") weren't chosen? – R.M. Jan 26 '23 at 16:02
  • 1
    "String is the literal meaning of items in a sequence." Integers are strings of bits. Floats are strings of bits. Arrays are strings of whatever. – RonJohn Jan 26 '23 at 17:30
  • 1
    @R.M., nobody in practice says "a string of characters", except when they are being emphatic about the definition of the word "string" itself. A string means "an array of characters" - it's an array of a specific kind. The reason we don't say a "string of ints" is for the same reason we don't say a "toilet of rice" - because a toilet is a specific kind of bowl, not a synonym for the word bowl. Also, "text" tends to mean information in terms of the alphabet - a string can be entirely symbolic or numeric. It's a push to regard £**100.00 as "text" in the pre-computer era – Steve Jan 26 '23 at 22:22
  • 1
    @Steve Yes, and I was simply pointing out that the question is asking why we specifically mean characters when we say "string", something this answer doesn't even attempt to address. If it "Works the same way with string of pearls, string of turtles or string of stars", why doesn't it work the same way for string of ints? When and how was that decision made? That's what this question is ultimately asking. -- Note that I'm not asking new questions for myself, I'm simply pointing out why this answer doesn't cover the questions which were already asked by the OP. – R.M. Jan 26 '23 at 22:51
  • 1
    @R.M., it seems quite plausible to refer to a "string of numbers" outside a computing context (e.g. "I asked him why profits were down, but he just gave me a string of numbers!"). There is no "string of ints", because an "int" quite clearly establishes the context as computing, and in computing the word "string" has a specific meaning which is not a synonym of "array" - just as "toilet" is not a synonym of "bowl". The OP's own question already gives enough etymology and metaphor to explain why the word string has its meaning. – Steve Jan 26 '23 at 23:05
  • @R.M. Going back you may notice that character strings were as well called arrays - Dartmouth BASIC even required the use of array functions to handle them. – Raffzahn Jan 26 '23 at 23:50
  • 2
    For a string to be also an array, it surely would have to be indexable. Algol 60 strings were not indexable. – dave Jan 27 '23 at 00:04
  • @wizzwizz4 Because Array was already way before computers a well defined term for multidimensional ... well ... arrays. A string hasn't multiple dimensions, so it's at best a vector - except that is also already taken for numbers. (helps to remember that thoday's assumption that a character is well described as a single numerical value wasn't all that back then. Characters were Hollerith codes) – Raffzahn Jan 27 '23 at 00:12
  • @Raffzahn - Baudot codes, surely? ;-) – dave Jan 27 '23 at 12:30
  • @another-dave real computer (aka data processing) used Hollerith codes. Baudot was only a big for wannabe computers which rarely handled character data at all. The term 'string' as data element becomes only necessary if it gets handled. Which needs lots of memory, something all sizes of early computers are short of - except, real computers had those incredible useful 187x83 mm devices supplying and carrying away near endless amounts of data :) – Raffzahn Jan 27 '23 at 16:12
8

FORTRAN did not have strings: it had "Hollerith constants". "Characters" were added in F77.

COBOL60 had 'characters', making up 'words' of up to 30 characters. http://www.bitsavers.org/pdf/codasyl/COBOL_Report_Apr60.pdf

Dartmouth BASIC did not have 'strings' when in was introduced in 1964. Pascal and c (1970 and sometime after 1970) did not "strings" but did have "character arrays".

By 1974, BASIC did have strings and people knew what they were: the 1974 Pascal "User Manual and Report" references "algol bit strings" (sets), but also the absence of (character) string operations "that users may expect". http://prog.vub.ac.be/~tjdhondt/ESL/Pascal_files/PASCAL%20user%20manual%20and%20report.pdf

Although the origin of the word is obvious and historical, it wasn't widely used in the modern sense until sometime 1970-1975.

david
  • 307
  • 1
  • 6
  • 1
    Don't limit yourself to variable types, consider literal strings as well. "Literal strings of characters" appears in COBOL manuals in the early 1960s. – Leo B. Jan 26 '23 at 08:33
  • Neither Ritchie nor Wirth choose to use the term in their first iteration around 1970. I say "not widely used". – david Jan 26 '23 at 09:01
  • 2
    Algol 60 had strings by that name, so the 1970s is way too late a date; the Report was published in 1960. IAL (Algol 58) allowed for extralingual strings: procedures may require as parameters quantities outside the language; e.g., a string of characters (...). – dave Jan 26 '23 at 12:09
  • @david - I am unclear to what "first iteration" refers to, but the original K&R book on C uses the term "string" extensively, for example in "string constant". – dave Jan 26 '23 at 12:22
  • 2
    SNOBOL4, an early 1960s language explicitly designed for processing strings, uses that word to describe itself: the basic data element of SNOBOL4 is a string of characters. The '4' is 'version 4', the term was used in earlier versions, as this ACM citation demonstrates. – dave Jan 26 '23 at 12:27
  • 1
    Jean Sammet's 1960s survey, Programming Languages: History and Fundamentals uses the term, and I infer it was commonplace by that date,. – dave Jan 26 '23 at 12:38
  • PL/I had both character strings and bit strings. – John Doty Jan 28 '23 at 02:09