During our June Holidays, we had an interview with Professor Ling San of the National University( NTU). The following are the questions and the transcript of the interview by Bao Rong.
1. From our information, coding is defined as replacing words or phrases by letters or numbers, while cryptography is defined as the study of encoding (encrypting) messages and decoding (decrypting) messages. What exactly is the difference between coding and cryptography?
Ok, cryptography as you rightly said *coughs* is the art or the… science of disguising messages….erm. Its primary purpose is to make sure that only the authorized …person or persons can …obtain the true message. Now er so that is what cryptography is all about, but coding is a much more *coughs*, how should I say, it's a much wider term. Coding basically is, if u like, it's a technique, it's just a technique that *coughs*, that erm uses one... How should I put it? Coding is ya the technique that you use to represent some data. *coughs*. So, for example, the … cryptography in some sense is part of coding alright, in this case the technique you use is some sort of disguising technique, so a very simple form of cryptography for example, is *coughs*, so you take all the letters and shift back one place. Now, that is an encoding process, right, by shifting the letter by one place, that is an encoding process. That process is both cryptography and coding. But, on the other hand, there are some other *coughs*, some other codes or coding techniques which are not for the purpose of cryptography. For example, *coughs*, if you have erm, think of a, like, TV screen or whatever type of screen. Now, it's made up of all this pixels. Right, so each pixel represents a colour, now, how does a computer or electronic device keep track of the colour? *coughs*. They are not recorded as, erm blue or light blue or whatever colour. So, each of this colour is given a certain code, is given a certain string, ok, a certain shade of blue is 1001, and some other shade of blue might be 1011. Ok, so that string that is used to represent the colour is the code. Ok, so basically, I know that u must have heard of ASCII code. Have you heard of ASCII code? Never? A s c ii, ASCII code basically is …is, ya, is, the way it's a fixed length of bits to represent all this symbols on the keyboard. There are many symbols on the keyboard right, but how does the computer know and detect that this is, you know, lower case a, and when we use a shift and make an upper case a? When u hit a certain key, it records it as a bit of zeroes and ones, but there's a fixed way of recording it, and that's called the ASCII code. That has nothing to do with cryptography, nothing to do with finding a data, ok its just how the machines or whatever devices need to understand our language.
2. Following up to the previous question, now that you have explained to us the difference between coding and cryptography, can you please tell us how they are each used, as there seems to be much confusion about the usage of the two words?
*coughs*. Very often, the confusion arises because the layman uses the word “code to represent what u call... er…an encryption technique. Whenever we talk about codes, we use it very loosely to mean some ways of hiding the information… secret codes if u like. So, that's where the confusion arises lah. But if u wan to be scientifically and technically correct, then cryptography is what we always think about just secret codes, but coding itself is jus a process of translating thoughts, if u like, translating human thoughts into the machine understanding, to a machine set of data. That's what coding is all about. So, anything that does a translation is a code. Put it simply, suppose there are only two piece of information that I ever care about, yes and no. Ok, now suppose that I m living in a world where everything must be represented in zeroes and ones. Ok, then I must think of ways to represent these two pieces of info. So I can say 1 is yes and 0 is no. ok once I have done this translation. Over here this is the code . Ok so that is any way of representing a message is a code.
Yes = 1
No = 0
3. Are we affected by codes in our daily life? If yes, how much are we affected by them and why is it so?
Many many ways ok let me ask u lets go back to secret codes, what can u think of what uses can u think of
Wars, game codes, email, have u heard of PGP, pretty good privacy. It is a technique of encrypting, encrypting is disguising, so it is a technique of disguising email messages or any online transactions, so for example if u r going to send your credit card number by internet. You dun wan to be sending it just like that. Otherwise anyone who gets hold of it, can start using tt card to o all sorts of purchases, so when u send sensitive info on the internet, they need to go thru this encryption technique, usually they use PGP, what they do is they use some sophisticated maths and they jumble it into something totally none recognizable, but a the receiving end, so when the bank receive that jumble bit of information they have what we call the decoding algorithm or the decrypting algorithm so they will try to trace back and c if the original number is the valid one. Related to that is…er…do you use atm cards? Not yet? When u go to the atm machine u want to withdraw money u have to key in your pin, now it's very important not to lose your pin, but then what makes u so sure it's safe to use the machines? For example, u pin is 1234, so when u key in, surely the bank keeps a record of al transactions, and when u key in 1234 they machine itself must know how to check if 1234 is the correct pin, but it cannot just simply just check 1234, if it does right, and it keeps it in record, ok, u know er your card is whatever number it is and I just checked that your pin is 1234 which is correct and then you withdraw X amount, and so on right, the next thing is going to happen is someone is going to steal your card, an employee for example, looks at the records and now I knows this guy withdrew X money and his pin is 1234, and next thing I do is, u know, track this guy down, and steal the atm card, and I just key in 1234, and there s no protection. So what do they do? They do something very simple. So u key in 1234, inside the machine, it does some sophisticated mathematics, and they jumble up this number into something which could be, u know, 9999999. And then it checks against its database and says 9999999 is the correct pin for this card and allows u to do the transaction. Now, when the employee comes, he looks at the record and he sees a whole string of 9s, but he wouldn't have thought that came from 1234. Ok? And that is the protection. So cryptography, affects our everyday life more than just war times. Yah, we talked about two issues right, coding and cryptography. Now there are, apart from the secret codes arh, now what are the codes do we know when we talk about. There is another kind of codes that we call error correcting codes. What r this, this r very commonly used. An error correcting code is the following. So, if you think of this very simple example, if yes is 1 and no is 0, suppose that now we need to transmit this information either to a satellite or whatever, unfortunately sometimes when you use those modes of communication, there is interference. Interference, it can flip this bits ok, so suppose ok, er, u make an appointment with your friend, lets go for a movie, if it's yes send a 1 and if it's no send a 0. So your friend sends a 1, but unfortunately along the way interference hits, and this 1 become a 0, ok when u receive it, you wouldn't have imagined that an error has occurred, when you receive the 0, you would have though that, oh well he said no. unfortunately in reality an error has occurred. If in everyday application we need to be a bit more careful. How do we know that an error has occurred, and how do we improvise it, become two rather important issues. Otherwise you will constantly be getting wrong information without realising it. So there's one way to make it detect an error more easily, and that's for example, you repeat the same number once more, now why do I say that it's slightly better? Because chances are, now u r sending two identical bits, assume that along the way interference this, and chances are it hits one bit, we have to make some assumptions lah, we cannot say that all the bits get screwed up; if that's the case then we are living in a rather chaotic world. But if you assume that most of the time things work nicely, but occasionally things get screwed up, then if the person sends two 1s, and interference hits, but not the worst case scenario, in which both of them becomes 0s, only one of them become 0, in that case, that 11 become 10 or 01. Now, when you receive 10, you will say: I don't understand it, because in my world, there are only two piece of information, which are either 11 which says yes or 00 which says no, and now I receive 10 which is neither one. So what logical conclusion must you make? There has been an interference, right? But of course, we don't know which one has been interfered with, correct? But we know there has an interference, so the least we can do is to tell the other party to send me again, you know, something got screwed up, send me again. So we have done one step better, no we can test an error this is important, because in most... communications we want to make sure, that.. That data is not corrupted. Now, but there's still a ...a shortcoming with this particular code. It can detect the error, but it cannot correct the error. Any ideas how we can make it correct the error? *silence* what did we do to improve... the usefulness of the code just now? Add one letter. So how do we improve it further? I know it may not be the best way, but we can add one more. In terms of efficiency, it's rather stupid right? We could have just used one bit, one digits correct, to represent the info but right now we are using three, ok, rather wasteful. But *coughs* suppose now we use triple one to present yes and triple zero to represent no *coughs*. Ok, let's go through the same assumption again, ok most of the time things are okay, but occasionally one thing gets screwed up. So now, suppose again I send triple one, along the way interference hits but it doesn't hit too much. But it hits just one fellow, ok? The assumption is it hits rather few, two is too many, ok, two out of three is a lot. So it hits one fellow, so u r going to end up with let's say, 101, so ok, so cough when you receive the signal u get 101 now just as before we can pair with this two we know that this is not a valid string. Agree? So you say ok, I know now that an error has occurred. But do I know what the original message should have been? The assumption is that as few as possible has changed rte, so starting from 111 and ending at 101 would involve changing just one, starting from 000 and ending up here 101, would involve changing two. So if our assumption is as few errors as possible have occurred, then which would be the logical choice, 111 or 000? “Yes”. So even though... when I receive 101, ok under our assumption that normally things are quite okay, we would have been able to guess that the original message is “yes”. Now such a code, this is what you call a code arh, *coughs* this is called a error correcting code, because when an error occurs, you can correct it back. Now where did I go through this in such a great detail? Because this is used everywhere in our lives without us knowing it. It is in disguise. It's in the computer, *cough* because computer involves a lot of this transmission of information, inside a chip, and whenever there's a mistake, er the consequences can be disastrous, so what the...the manufacturers build in to the computer is… is this code, is this code that can help u detect one error. So each keystroke is represented by a certain number… er a string of certain length, and whatever info is keyed in, it's going to be stored and transmitted as a whole long string. They are sort of broken up and interpreted as sort of certain code words, now the way there are represented , the way they are told to represent this piece of information is such that, if there's one bit in error, it will automatically correct itself back. It's somewhat like the when you use, I'm sure you use Microsoft Word right… it's somewhat like the spell check, you know, sometimes you happen to type a slight mistake right, and it just correct itself right as you move on. Ok it works something like that well, it's hidden in the computers, and it's hidden in CDs, CD players. Same thing. Why? CDs encode a lot of information, and the way CD players read er, is to use lasers. It will just keep reading, now what happens is that it is just a whole string of zeroes and ones, u know, I don't know if u have learned about that, if you cut open a CD and put it under a microscope, you will see a lot of er, ups and downs, ok that's what it looks like, a lot of ups and downs, so I don't remember which is 1 and which is 0, but one is 1 and the other is 0. So it will read that and that will give it a whole string of continuous string of 1s and 0s. Ok? That's how CD players and er disc drives read CDs. But there's one problem. If you make a scratch, that's going to distort the ones and zeroes. Correct or not? That happens; well if it's music, no big deal, you know just some wrong notes here and there, buts if it's a CD-Rom which contains data, that's disastrous. It will just read totally wrong things, or it will give u distorted information. Ok, so *coughs* that w have to a bit more careful, alright s tts why its seem to be an error correcting coded built into it, such that when there's a scratch or when there's dirt, it won't affect, generally speaking, it won't affect the reading of the CD, of course, if you start scratching the entire CD arh, that's an entirely different story, right? If it's just an unintentional scratch, that's just okay. So, that's hidden u know in everyday life. DVD works just like CD, but the quality is better.
4. Following up to the previous question, do you think that many people take codes and cryptography for granted in their lives, and as such ignore or do not bother to find out more about it? Why is this so?
Absolutely right. Because... most of the time, we cannot b bothered to ask why things work right, most of the times we just...we just...it's not just with codes and cryptography right, I mean…it's about anything, rite for example, how often do u ask yourselves how does a computer work, how does a recorder work, how does a handphone work, how does a mp3 work. I dun think we really ask ourselves such questions right? So… that's just human nature lah.
5. What do you think coding and cryptography would be like in the future? Why so?
What do you mean? How will it develop? Ok, cryptography for sure, is going to become more and more important, for the simple reasons that er... we are likely that we r to use more and more electronics transmissions er… electronic transactions, because of that, cryptography is going to be very very important. Coding er as in like error correcting codes will continue to have ... at least the same... importance, because again in all this electronic transmissions, we cannot do without error correcting codes to ensure that the data is... is not corrupted too much. So, but er *coughs* my feeling is that crypto will grow a bit faster then coding, because I think at the moment, most er…practitioners they are quite happy with the state of error correcting capabilities of the codes, but not so much for crypto.
Is number theory related to coding and cryptography? How so?
Absolutely. How? Ok, erm, how, that's a good question….there is a very famous... ok, there is a very famous er……way of encryption called the RSA, I don't know if anyone of you has heard of it? Yah? It stands for …it's the initials of three people's surname, R for Rivest, S for Shamir and A for Adleman, this were three professors from MIT.. I'm not sure if they are still al in MIT now. They designed the simple er... crypto system that is called a public key crypto system. Now, there are two types… have you heard of public key crypto system? *Coughs*. The secret key and the public key. The secret key works in the most intuitive way, ok, like the shifting of the positions in the alphabet one position forward, you know…SO FOR EXMAPLE YOU MOVE THE POSITIONS BY ONE RIGHT? So then, to decrypt it, what you do is to move back, just reverse the process. Now, the public key is a totally different ball game, how does it work? Erm… cough, knowing how to encrypt the message, does not give you any advantage in decrypting or deciphering……er, this is highly non-intuitive right , so this is where we start to use…er rather sophisticated ideas, but very simple mathematics. There is so... this RSA erm, it works like this, I don't know, have you done modular mathematics before? Not yet? Ok never mind, so what it does is this, it takes two huge prime numbers, p and q, ok, it takes two huge numbers, pan q, multiply them and that's called N. ok? And then what they doe sit this, suppose I have a message I call it P, its just a number ok, and erm, what they ……what they does is this, everyone of us in this room, suppose we wan to play this game of sending messages through public key. Every one of us will make our own p and our own q, every one of us will take two big prime numbers and multiply and publish your telephone directory if you like, N. ok? Together with another number e, er... there's not much requirement for e except that p cannot divide e and q cannot divide e…no ... sorry ... except that …that gcd or the hcf, remember your hcf? It's a very strange condition, when (p-1) and (q-1) and e, the gcd or the hcf is 1.everyone of us will pick…oh gcd is greatest common divisor or highest common factor. Same thing, I don't know what you call them in school this days…HCF? Well, ok, so everyone of us will pick our favourite N and e, ok N comes from multiplying two big prime numbers, and e is a number that cannot have common factors with (p-1) and (q-1), ok? Then what we do is this, we publish this, just like our phone numbers, you know, publish in the phone directory or on the internet, and you tell the whole world: “ok, if you want to send messages to me, then you use these two pieces of data.” So what they do is this, suppose I have a P which is my information, P is just…just think of it as a number, ok? You calculate P to the power of e; it's a huge number…ok? You take its remainder, where it divides by N, so never mind about this quotation, what this means is just you take this huge number, divide by N, keep this remainder…ok? Then what you do, that remainder lets call it C. you send C. I receive C. so suppose now you are going to send a message, so you calculate calculate, you get a C you send it to me.
C = P^e mod (p-1)(q-1)
When I receive my C, I am going to take a certain d power…ok? The D power is related to this e, ok? Only I know how to calculate it, because I m the one who chose the p and q correct or not? I m the one who chose, you cannot tell the world what your p and q are arh, if you tell them, the game is gone…ok? You can tell them N and e, so because I know my p and q, I can calculate this special d, and what I do, is when I receive my C, I am going to raise it to the d power, and I am going to keep its remainder when it divide by N…have I lost you, feel free to stop me, you know, if I lost you? Now that will give you P, now let me give you an example. *takes out blank paper* take some examples, ok? Big examples are nasty, let's take p to be 5, and q to be 7, ok in reality we don't take such small numbers; we will take 128 bits of numbers, huge numbers. Ok, so N is 35 correct? But what is (p-1)? (p-1) is 4, and (q-1) is 6, ok? So I need a number which is relatively prime to 24, er which has no common factor with them. So, it's what? 23 for example, right? But 23 is a bit big lah, you are going to raise it to 23 you know,*laughs* ok, erm yah 5. 5 is a nice one right, so let's take 5, 5 is quite small, in fact it's the smallest one. Now, it so happens that, I know but I won't tell you why, but the d in this case is also 5. Ok? If you take … e to be 5, d is also 5. It's not always equal arh, it's a total fluke. Ok, so suppose we start with the message; let's say, 2, ok? Let's say the message “F” is 2, so you are going to send it to me, but you are not going to send 2 arh, so you take 2 to the 5 and then divide by 35, keep the remainder. 2 to the power of 5 is what? 32., so 32 divided by 35 keep the remainder, that is still 32, so C is 32. Ok? C is 32. So this is the number you want to send to me, this is the encrypted form. Now I receive 32, what am I going to do? This is my secret weapon, *coughs* ok; this d is my secret weapon, so when I receive 32 i am going to raise it to the 5 th power, now this is pretty big……ok, this is pretty pretty big*coughs* (laughs at calculator), this is not impossible, it can still be handled by calculators…….33554432, correct arh? Now I divide by 35 and I keep the remainder… (Uses computer calculator)…14…that's wrong, where did I go wrong? …… (Double checks calculator) maybe it's a rounding error...no this cannot be rounding error……arh I made a mistake… er…33445532…sorry sorry, so this number divided by 35, is some number: *.057blahblahblah. Ok? So…yah, so the remainder only comes from this part right, so this answer is two. The remainder is 2, correct or not, the remainder is because of here right, so the question is what number divided by 35 gives you this (the decimal places) right, and its true. That means that remainder is 2, ok? I am sure you guys know programming right? Huh? That's god enough; I mean instead of getting calculators and so on, you can write a very simple program, right just do the division get rid of the integer part, and then this part you multiply by 35 to get back the remainder, ok? So, this is 2, and you c I have retrieved the original. Now this is a very very simple technique from the RSA…and there's a big company that depend on this simple algorithm to make huge bucks, it's called RSA. This company is called RSA, its based in the silicon valley. Because they discovered this algorithm, they patented it, you know and they transformed it into workable forms, and they made big bucks. Now this is the public key cryptography. Ok? Erm, now I forgot why I came all the way into the secret key and public key and all that, what is the question? ...arh, there you c, that's number theory, that's the easiest sort of number theory. But er, this ppl use more and more sophisticated number theory, but this is sth which I am sure you can understand.
7. What do you think are relevant (very important) mathematical concepts linked to coding and cryptography? In your opinion, which is the most important concept?
Oh everything…erm...really everything. Yes, combinatorics … *laughs* com-bi-na-to-rics, yah, c-o-m-b-i-n-a-t-o-r-i-c-s. its about how you mix and match things. Ok? Erm, algebra, its very important erm… these are very important, but there are also other important theories, like probabilities. Ok? That is important to analyse the errors and so on.
8. Until now, unbreakable codes have been created by someone, and at other times broken by someone else. Which have been the more “powerful” or important since the past? Which do you think will win in the “race”, the code-makers or code-breakers? Why so?
Beats me, beats me, because er *coughs*, we continue to get surprises…in fact; one area I forgot to mention is quantum theories, quantum physics. Ever heard of quantum physics? Yea, quantum physics is used to design some of the latest idea of codes, but unfortunately quantum theory is still a theory, it's very difficult to realize it. Up to now, we have only built a quantum computer that can factorise up to 16, if you give it a number up to 16, it can factorise it, give it a bigger number it doesn't know how to factorise…factorise mean getting all the prime factors out. *Coughs* so the quantum theory, that is it so far.
*Laugh* breaking it won't make you money right, *laughs* the challenge is always to make a… a, so here when we are toking about codes, we are always toking about secret codes arh…the challenge is always to make a kind of code that no one can ever break…what is true is that up till now there is no code that ppl have ever designed that they can prove with 100% certainty that no one can ever break systematically……that is really the biggest challenge…all codes that ppl made, designed *coughs* er can be proved to be quite safe, if you make certain assumptions. So, let me give you an example, like this RSA method that I just told you about, it's incredibly simple, you know, it depends on one very important assumption…it…it assumes that factorising a number is very difficult……what do I mean by that ? Ok? If I give you a small number like 6, and ask you to factorise it. Big deal, immediately you tell me 2 times 3 coughs, even if I give you something bigger, you know, like 54, yes you can start peeling off the factors one by one. Ok? *coughs* But if I give you 144169 and ask you to give me all the prime factors, that's going to take you quite a bit, and it's not a coincidence, factorising a number is very difficult, *coughs* or at least most ppl believe that it is very difficult, in the sense that no one has been able to do it in a very smart and clever and fast way so far. Ok, now this RSA crypto system, we call it, *coughs* depends on this on assumption……that factorising a number is very difficult, because why? Everything voice down to this N, which is made up of the product of two prime numbers. If I can come up with a very fast and clever way of factorising all numbers, then the moment I c this N, I say, “oh, here are my p and q, if I know my p and q, I can calculate my (p-1) and (q-1) and that actually there a very systematic way to derive d. Ok? Once, we know “(p-1), (q-1) and e” there's a very systematic way to calculate the e. I can do it…so you know the d, if you know the d, you can do the decrypting already, ok? So, the moment you have a fast way of factorising a big number, I tell you, this RSA Company will go bankrupt overnight. So, that ...that is where the challenge lies.
9. What do codes do and how they can they improve someone's daily work?
In the obvious way, you know, the secret codes right, crypto system, er if they are easy to use, and they are efficient, and they are very safe, that will enhance our security a lot, so you can feel safe, you know, doing all your banking on the internet and not to worry about hackers and so on cough or even for military purposes, you will feel a lot safer right, as your … your er secrecy is ensured and… in terms of other codes like the error correcting codes I told you about, it will arh, coughs have again more efficient way of detecting and correcting errors, which will lead to more…more efficient products lah.
10. In the past, are there any major historical events that are affected by codes in some way or another? What is it, and why is it affected so much by them?
Major historical events arh? *coughs* ……I am trying to think, nothing spins to my mind, (Bi Ran: maybe this country develops a sort of code and uses it to transmit secret messages over, and then they win a war because their enemies cannot understand)...oh that. That, I am sure happens most of the time, but er…you might just want to search the internet…like all those Caesar ciphers, they might have some stories that go around them…ok *coughs* Most of the time in the field of crypto, you know, any news you hear is bad news. Because when a code works well, everyone is just going to keep quiet about it but when you crack a system, when n you crack a code *coughs* then the whole world gets to hear about it. You see …erm that's the end of all that supposedly brilliant idea, so…but I think most ppl will just keep quiet because it doesn't do them any good what…except to…to…boost their own ego a little bit…if you can crack other people's code, what good doe sit do to u. ok? If you shout about it, then of course there is going to be a change in the code, and then your fun is gone, right? But if you keep quiet about it, you can continue to keep breaking it……You've run out of questions, is it? *laughs*
11. Do you have any other information you want to tell us about with regards to our topic?
Well, I can show you a few sites arh………I can show you how a DVD player works…this picture has a name arh, now I am going to choose the speed at which the…correction ...I am going to introduce some errors in this picture afterwards, and then I am going to correct it, now what you see is this, the ways codes are used to correct errors, is by using what we call redundant information, just like just now the yes and no code arh, when it's very plain arh, with no extra information, it's just 0 and 1, it cannot correct, but if you introduce a lot of redundant information, so I repeat the 1 several times and the 0 several times, then suddenly it ... it er gives me this extra ability to detect and correct the mistakes, so this things that you c here , these dots here, are basically the extra bits of information that we put into the picture to allow us to correct mistakes later. Ok? So, now we introduce errors, so let's take... er… a laser lade and start disfiguring her (refers to DVD picture on computer) Ok? And now, we correct, so let's… now you see er, it's doing the correction, it's cleaning up her face…you c all the scars have gone, 100% of the errors were corrected...u c that? That is how the CD and DVD works imagine the same scratch right. So, now, suppose I continue and I become a bit more vicious and disfigure her in ... in such a terrible way, right , and I am trying to give her some cosmetic surgery…look at this she's only 30% restored right, or permanently disfigured. Again, this gives you an illustration of how the DVD or CD works, if you start scratching your CD like that right, and then there will come a point where it's beyond the point of no return, so you can never retrieve that data anymore. Ok? Decode is the same, oh it asks it to clean up further……this are the errors, so this is...is an example, let's see… any other sites…this are all on error correcting codes , this is an interesting articles, ok this are not for the secret codes arh, this are not. This article might still be there...yes it tells you how it is being used in application, the key page is this, my old web page Inthe nus maths website, hten you go to the advanced coding theories, and there's some links. Reed Solomon code is the one used in CDs. Shannon was the one who started the error correcting codes. Hemming is an interesting guy, he also started implementing this error correcting codes, ok, he started working and…well this article is pretty wordy, so I don't think you would be so interested. But, there's this site arh, http://www.howstuffworks.com , ok there's some good information regarding coding and cryptography.