32

Inspired by this question on ASCII, I have wondered similar things about EBCDIC.

At work we have an EBCDIC file that gets sent to a mainframe (I presume an IBM one) and to view it on my laptop I needed to run a command to convert it. dd if=blah.ebcdic conv=ascii > blah.txt Before I found that command I took a peek at the code page to see if I could whip something up myself.

Like ASCII you can shift a bit to get from lowercase to uppercase (0x8_ to 0xc_ is one bit different). However, the cases are not contiguous themselves. The low bits 0x_a to 0x_f are skipped. Is there a reason?

Also like ASCII, the numbers' low bits match the number they represent.

EBCDIC Code page

RonJohn
  • 708
  • 5
  • 11
Captain Man
  • 663
  • 5
  • 14
  • 14
    See https://en.wikipedia.org/wiki/EBCDIC for a start, and note the relationships with punched cards and not wanting holes too close to each other for structural integrity. – Jon Custer Jun 26 '19 at 16:39
  • 2
    @JonCuster thanks for the insight, can you post the relation with punch cards as an answer so I can give it an upvote? If you would rather not I can post it myself, I just don't want you to feel like I'm "stealing" it. – Captain Man Jun 26 '19 at 17:52
  • 2
    feel free to steal! It has been a long time since I used punch cards (or dropped them on the floor). – Jon Custer Jun 26 '19 at 17:56
  • I'm not convinced by logic about avoiding card damage, for two reasons. One is that you often get long runs of holes in the top three rows from alphabetic data. The other is that IBM also used "column binary" format cards where the 24 positions in two rows represented 3 8-bit bytes. Storing binary data (e.g. executable file images) in that format, about 50% of the holes on every card were punched, and that never gave any problems. (We used to ship executable code in column binary format to customers who didn't have any compatible mag tape drives, and it never gave us any transmission errors). – alephzero Jun 26 '19 at 18:30
  • 3
    Radix-sorting cards that contain nothing but letters, numbers, and blanks requires two passes per character position. The first pass sorts cards into one of ten bins based upon the bottom nine rows, and the second sorts them into one of four bins based on the top three. Using more complicated hole patterns would necessitate the use of more passes or more complicated sorting apparatus. – supercat Jun 26 '19 at 18:41
  • I suspect the punch card layout was set up about 1890 – chux - Reinstate Monica Jun 27 '19 at 03:37
  • Note that it is likely the FTP server can do the ebcdic conversion for you. – Thorbjørn Ravn Andersen Jun 27 '19 at 08:12
  • Just a note that using iconv should be easier and safer than dd. – OrangeDog Jun 28 '19 at 11:26
  • EBCDIC is a binary-coded decimal encoding, so it's not surprising that the alphanumerics are discontinuous in hexadecimal. What's interesting is that they're also discontinuous in decimal. – Mark Jun 28 '19 at 20:33

2 Answers2

28

There is a clue in the name - BCD stands for "binary-coded decimal", where 4 bits are used to represent 1 decimal digit (0-9). The hexadecimal values A-F are not used in BCD.

EBCDIC is an extended version of BCDIC, and it shifts BCDIC alphanumerics, and inserts characters in some of the non-decimal positions. But there's a simple relationship to ease conversion of BCDIC to EBCDIC.

Toby Speight
  • 1,611
  • 14
  • 31
  • 2
    I suppose this begs the question why BCDIC encoding is not contiguous but as Jon Custer mentioned in a comment it has to do with punch cards and ensuring the holes are not too close together. – Captain Man Jun 26 '19 at 17:27
  • 5
    BCDIC has the same issue, "binary coded decimal" uses 4 bits to encode digits from 0-9, which means hex values a-f will generally not be used. The gaps where the a-f ranges fall will naturally lead to non-contiguous encodings. – Ken Gober Jun 26 '19 at 17:56
  • 5
    @CaptainMan It doesn't beg the question. It raises the question. See https://en.wikipedia.org/wiki/Begging_the_question. – Monty Harder Jun 27 '19 at 22:22
  • 7
    Y'all are thinking too much in terms of bits and bytes in the way we use them today. Computers haven't always been base-2. It's only because 8-bit bytes and base-2 (and particularly, 2s-compliment math) are common today that these questions make sense. There is no "space" between "9" and "0" in a decimal computer. – Julie in Austin Jun 27 '19 at 23:43
  • 2
    @JulieinAustin there is however a space between 89 and 91. "Because it's decimal" doesn't justify why the 0 column is also skipped. – OrangeDog Jun 28 '19 at 15:00
  • @OrangeDog - Again, you’re thinking in terms of modern computing. “Counting systems” didn’t always include “0”. It’s a question no one ever asked when I was still punching card decks or using EBCDIC computers. It just “was”. And like 6-bit machines, years later also though 6-bit was dumb. But, it was a thing and it sold machines, so ... – Julie in Austin Jun 29 '19 at 12:35
  • @JulieinAustin if you think counting systems never included 10, 20, 30, etc. then you're deluded. And all the time you were using EBDIC you never used a space, an ampersand or a hyphen? – OrangeDog Jun 29 '19 at 12:54
  • No, I mean 0. As in, 0.000. The number between 1 and -1. And “-1” hasn’t always existed either. But you are wrong if you think “10” was always a “1” in the tens-place and a “0” in the ones-place. There weren’t always ... “places”. Hebrew, for example, has a “10”. And it’s a “10”. Not a “1” in the tens-place, which doesn’t exist. – Julie in Austin Jun 29 '19 at 13:00
21

As pointed out by Jon Custer, part of the reason is due to the input at the time being punch cards. If holes were close together there was a risk of the card being unreadable or ripping.

In addition, this punch card from the Wikipedia article helps explain why both uppercase and lowercase end at 0x_9. The punch card only goes from 0 to 9. I don't know how A through F were entered, maybe different cards or multiple holes (or maybe Wikipedia is wrong and this is for BCDIC, not EBCDIC).

EBCDIC punch card

Captain Man
  • 663
  • 5
  • 14
  • 7
    A..F wasn't entered at all, as input was decimal. Mainframes where made to cranc out invoices, all decimal in dollars and cents (or whatever else was used to create debt). Maiking them binary was already an odd move creating a lot of fights between designers :)) – Raffzahn Jun 26 '19 at 20:53
  • 7
    That card is a standard IBM punched card that uses 12 positions for encoding. Each of the decimal digits is represented by a hole in one of 10 positions. Each letter is represented by a hole in one of three extra positions and one of the digit positions. Other characters are represented by two or three holes in various combinations. BCDIC is a way of compressing the 12 bit code of the card into only 6 bits. – JeremyP Jun 26 '19 at 22:30
  • 1
    I'm not sure what you mean by "how were A through F encoded". They were encoded in exactly the same way as on that punched card. This is a character encoding, not a number encoding. – JeremyP Jun 26 '19 at 22:34
  • @JeremyP: I'd think the easiest way to convert the punched card to an 6-bit binary code would be to use the bottom 9 rows to encode the bottom 4 bits, and the top 3 to encode the other two, but with a tiny bit of extra hardware to swap the codes for blank and zero. Codes with a lower-nybble value of A-F would be punched as 8+2 through 8+7. – supercat Jun 27 '19 at 06:22
  • 7
    @supercat The punched card code came first. There was no need to be able to encode 0xa to 0xf, they couldn't be expressed on the punched card. – JeremyP Jun 27 '19 at 09:10
  • @JeremyP: If one wants to encode 40 characters, that can be accommodated by combining zero or one of (+-0) and zero or one of (123456789). If one wants to accommodate more characters, more punch patterns will be required, and it appears that combinations of 8+(234567) are the patterns which punched cards adopted, which conveniently map to 1010 through 1111. – supercat Jun 27 '19 at 13:54
  • 3
    @Raffzahn it just so happens that the file that originally started all my curiosity is for sending out invoices. :) – Captain Man Jun 27 '19 at 14:50
  • 1
    @CaptainMan ROTFL. Great. Then again, chances where good. Did you know that the vast amount of credit card transactions around the world are still handled on /370 machines and native (often COBOL based) code? So all these fance new payment methods will end up in EBCDIC and added up using decimal based arithmetic. – Raffzahn Jun 27 '19 at 17:55
  • @CaptainMan - some CDC computers used a 7+9 hole combination to indicate some type of binary format on punched cards. I don't recall if reading nearly laced (all holes punched) on the high speed card readers was an issue. IBM's Fortran IV had an "unformatted" I/O option, but I don't recall if this meant more hole combinations would be allowed on punched cards. – rcgldr Jun 28 '19 at 08:14
  • 1
    @Raffzahn - I’ve written COBOL .. some ... and it’s still the only language for business transactions that Just Plain Works. It’s a shame it’s not a sexy language because it is pretty badass for what it does. – Julie in Austin Jun 29 '19 at 13:06
  • 1
    @JulieinAustin Never argued against using COBOL. As /370 Assembly guy I enjoyed a higher view :)) – Raffzahn Jun 29 '19 at 13:24
  • @supercat I'm not really sure what you are arguing here. Look at the picture of the punched card in the answer. It has an example of every character you could encode on an IBM punched card. It's a character encoding: if you wanted to encode a hex number e.g. 0x1f you'd punch a card with the characters 1 and F on it and then the program would turn it into as number just like it does with ASCII. – JeremyP Jun 30 '19 at 18:06
  • @JeremyP: To punch a single character whose lower nybble is 0x0F, one would punch one of the top three rows along with 8+7. The punched card features four characters with 8+7 punched, and they map to 0x4F, 0x5F, 0x6F, and 0x7F. – supercat Jun 30 '19 at 21:30
  • 1
    @JeremyP: Incidentally, I remember back in the 1970s the university where my mom was learning programming had some keypunches that were limited to the 40 characters whose lower nybble was 0-9, and there were signs up for how to punch the characters whose lower nybble was A-F. – supercat Jul 05 '19 at 16:29