Appendix C: Interview with Professor Ling San
1. From our information, coding is defined as replacing words or phrases by letters or numbers, while cryptography is defined as the study of encoding (encrypting) messages and decoding (decrypting) messages. What exactly is the difference between coding and cryptography?
Ok, cryptography, as you rightly said *coughs*, is basically the art or science of disguising messages, its primary purpose if to make sure that only the authorized person or persons can obtain the true message…now er, that is what cryptography is all about. Now coding is a much more *coughs* how should I say, it's a much wider term, ok? Coding basically is, if u like, a technique, is just a technique that coughs that er uses…one…how should I put it, coding is…ya the technique that we use to represent something *coughs. For example, the…cryptography in some sense is part of coding, alright? In this case, the technique you use is some sort of disguising technique. Well, a very simple form of cryptography for example, is *coughs*, ok so you take all the letters and just shift by one place right? Now that is an encoding process, by shifting the one place, that's an encoding process. So, that process is both cryptography and coding. Ok? But on the other hand, there are *coughs*... some other codes or coding techniques which are not for the purpose of cryptography. Example, *coughs* if you have a…ok think of a TV screen. LCD screen or whatever screen alright? Now, it's made up of all this pixels. Ok? So, each pixel represents a colour. Now but how does a computer or how does an electronic device keeps track of the colours. *Coughs* It doesn't keep…it doesn't record it as you know…blue or light blue, or whatever shade of blue. Right? So, what is done is that each of these colours is given a certain code, is given a string. So maybe a certain shade of blue is 1001. And some other shade might be 1011. Ok? So, that string that is used to represent the colour, is…is a code, is a code that represents it, ok? So basically…I don't know, you must have heard of ASCII code, have you? …A-S-C-I-I. Ya, ASCII basically is it's a fixed… it's a fixed length of bits to represent all this symbols (on the keyboard). Ok? There are many symbols on the keyboard right? How does a computer know that when you type this, this is you know, lower case a, and then when you do a shift and this is upper case a. So, what it does is, when you hit a certain key, it records it as a string of bits, of 0s and 1s. There's a fixed way of recording it, and that's called the ASCII code….that has nothing to do with cryptography, nothing to do with hiding the data, ok? Its just how the machines or whatever device needs to understand our language.
2. Following up to the previous question, now that you have explained to us the difference between coding and cryptography, can you please tell us how they are each used, as there seems to be much confusion about the usage of the two words?
*Coughs* Very often…the confusion arise because the laymen uses the word code to represent what we will call…er…and encryption technique. Whenever we talk about codes ok, we use it very loosely *coughs* to mean some way of hiding the information, secret codes if you like. Erm…so that's probably where the confusion arises, that if you want to be scientifically and technically correct, then er…cryptography is what we always think of just secret codes. Ok, but er…coding itself is just like what I told you, it's just a process of translating from if you like, from human thoughts to a machine understanding of a set of data. That's what coding is about. So, anything that does that translation is a code. Ok? Or put this in a way…supposed that there are only two pieces of information that I ever care about: Yes & No. ok? Now, supposed that I am living in a world, where everything must be represented in 0s and 1s. Ok? Then I must think of ways to represent these two pieces of information. So I can say 1 is yes and 0 is no, once I done this translation, over here, this is the code.
That is…any way of representing an information is a code.
3. Are we affected by codes in our daily life? If yes, how much are we affected by them and why is it so?
We are affected by it in many, many ways. Erm…let me ask you, let's go back to secret codes arh, crypto. Can you think of er what uses can you think of? […wars…game codes (Pokemon cheats)…] any others that you can think of? (Email) Have you heard of PGP? It's called Pretty Good Privacy…*coughs* it is a technique of encrypting, disguising email messages or any online transactions. So, for example if you are going to send your credit card number, by internet. Now you don't want to be sending it just like that. Otherwise anyone who gets hold of that number will start using that card right? So when you send sensitive information on the Net, they need to go through this encryption technique, usually they use PGP. They use some sophisticated mathematics to jumble up that number into something that is totally unrecognizable. But at the receiving end, so when the bank receives that jumbled bit of information, what they do…they have what they call the decoding algorithm, or the decrypting algorithm. So, they will try to trace back and see if the original number is the valid one. Related to that, I'm sure you use ATM cards? Not yet, but I'm sure you've seen your parents use it. When you go to the ATM and want to withdraw certain amount of money, when you key in your PIN, now…now, it's very important not to lose your PIN…but then what makes you so sure? For example, your PIN is 1234, you key in, and now surely the bank keeps records of all transactions, and when you key in 1234, the machine itself must know who to check whether 1234 is the correct PIN…ok, but it cannot simply just check 1234. If it does right, and it keeps in record…”ok, your card is whatever number it is, I have just checked, your PIN is 1234, correct, and you withdraw X amount.” And the next thing that is going to happen is, someone is going to steal your card, some employee in the bank looks at the records and you know, so and so had withdrew money in the bank and now I know his PIN is 1234, the next thing I do, is I start to do, is to check this guy down and to steal his ATM card, and I just key in 1234. Alright? There's no protection at all. What do they do? They do something very simple, ok, they jumble up all the PINs, so you key in 1234, inside the ATM machine, inside the chip, it does some complicated calculations, it jumbles up this number into something... it could be you know, 9999999, and then it checks against its database. It says “arh, 9999999 is correct PIN for this card” and therefore allows you the transaction. Now, when the employee come, he looks a the record, he's going to see a whole string of 9s, but he wouldn't have thought that that came from 1234. Ok? And that is the protection it gives. So, cryptography affects our everyday life, more than just wartime needs. Now, er…we talked about two issues just now: coding and cryptography. Now there are…apart from the secret codes ok, now what are the codes do we normally talk about? There is another kind of code that we call error correcting code. What are these? These are very commonly used. An error correcting code is the following. So, if you think of this very simple example again, yes is 1 and no is 0. Suppose that now we need to translate this information, either to a telephone line or satellite or whatever. Unfortunately when you use this modes of communications, there is interference. Interference, what they do, is… they can…flip these bits, ok? So suppose, you make an appointment with your friends “let's go for a movie, if its yes, send me a 1, if its no, send me a 0.” So he sends a 1, but unfortunately along the way, interference hits, and this 1 becomes a 0. ok? When you receive it, you wouldn't have imagined that an error has occurred. Correct? When you receive the 0, you would have thought “oh well, he said no”. Unfortunately, in reality, an error has occurred. So in everyday application, we need to be a bit more careful than that. How do you know if an error has occurred, and how do you correct it, become two rather important issues. Otherwise, you'll constantly be getting wrong information without realizing it. So, there's one way to make it detect an error more easily, and that's for example, you repeat the same digit once more. *coughs* Why do I say that this is slightly better? Because, ok, chances are, you are sending two identical bits, either two 1s or two 0s, assume that along the way, interference hits again, and chances are it will affect one, we must make some assumptions lar har, we cannot assume that all the bits get screwed up and that sort of things, if that's so, then we are living in a rather chaotic world. Ok, but if you assume that most of the times, things work nicely, but occasionally this get screwed up. Then suppose you send two 1s, and interference hits, but not the worst case scenario, where both of them becomes 0, but 1 of them become 0. So, in that case, this 11 becomes 10 or 01. now, when you receive 10, you say” I don't understand this, because in my world, there are only two pieces of information, either 11 which says yes, or 00 which says no. And now, I receive 10, which is neither this one nor this one.” So, what logical conclusions can you make? That there has been an interference. But of course we don't know which one has been interfered with. Correct? But we know there has been an interference. So, the least we could do is to tell the other party “ok, send me again, something got screwed up.” We have done one step better, and now we can detect an error. Ok, this is important, because in most communications, you want to make sure the data is not corrupted. Now, there is still a shortcoming with this particular code. It can detect, but it cannot correct. Any ideas how we can make it correct the error? ...What did we do to improve the usefulness of the code just now? Add one digits. So, if you want to improve it further, I mean, that may not be the best way, but you add one more. In terms of efficiency, it's rather stupid, we could have just used one bit to represent the information, but now we are using three, rather wasteful. But, suppose now we use triple 1 to represent yes and triple 0 to represent no. Ok, let's go through the same assumption again, that most of the time things are ok, but occasionally one bit gets screwed up. Suppose again I send triple 1, along the way interference hits, but doesn't hit too much, so it hits just one fellow. The assumption is it hits rather few. Ok, two is too many; two out of three is a lot. So suppose it hits one of them, so you are going to end up for example, let's say 101. Ok? So, when you receive the signal, you get 101, now just as before, we can pair with this two, and we know that this is not a valid string. Agree? I know now that an error has occurred, but do I know the original message should have been. “Yes”, because the assumption is that it hits as few as possible rite? So, starting with this one (YES 111) and ending up here (message received 101), would involve just changing 1; and starting from this one (NO 000) and ending up here (message received 101), would involve changing 2. If our assumptions is as few errors as possible occurred. Then which would be the logical choice. “Yes”. O even though we have received 101, ok under our assumption that normally things are quite ok, we would have been able to guess that the original message was “Yes”. Now, such a code, this is what we call a code arh, this is called an error correcting code, because when an error occurs, it can correct it back. Now, where did I go through this is such a great detail? Because this is used everywhere in our lives without us knowing it. It is in disguise. It's in the computer ok? Because, computers involve a lot of translation of information, inside the chip, and whenever there's a mistake, the consequences can be disastrous. So, what the manufacturers have built into the computers is this code that can help to detect one error, so each keystroke is represented by a certain number, a string of certain length, and so whatever information you keyed in, is going to be stored or transmitted as a whole long string. They are sort of broken up and interpreted as certain code words. Now, the way they have chosen to represent this pieces of information is such that when there is one bit in error, it will automatically correct itself back to the right thing. Its somewhat like…I'm sure u used Word (Microsoft Word) right? It's somewhat like a spell check you know, sometimes you happen to type a slight mistake right and it just corrects itself as you move on, it works somewhat like that. Ok? So it's hidden in the computers, it's hidden in CDs, CD players. Same thing. Why? CDs encode a lot of information in there and the way CD players read is to use lasers. What happens is that it is just a whole string of 0s and 1s, I don't know if you have learnt about that, if you look at a CD, you cut open a CD arh, and put it under a microscope, you will see a lot of ups and downs. Ok, that's how it looks lie, so I don't remember which is 1 which is 0, but one is 1 and one is 0.so you just read it and that will give you a whole string of 1s and 0s. Ok? That's how CD players and the Disc drives read CDs. But there's one problem, if you make a scratch on the CD, that's going to distort the 1s and 0s. Correct? What happens? If its music, no big deal, at most wrong notes here and there, but if it's a CD-Rom, which contains data, that's disastrous, it will just read, read wrong things ok? Or it will give you distorted information. So, that we have to be a bit more careful. So, that's why in CD players, an error correcting code is built into it. It's built into it such that when there's a scratch or when there's dirt, it won't affect, generally speaking. Of course, if you start scratching the entire CD, that's a totally different story altogether. Well, if it's just an unintentional scratch, it will be ok. So, that's hidden in our everyday life. DVD works just the same as CDs, but the quality is better.
4. Following up to the previous question, do you think that many people take codes and cryptography for granted in their lives, and as such ignore or do not bother to find out more about it? Why is this so?
Follow-up: If answer is they do not care or do not know about coding and cryptography,
What will happen if this goes on?
Absolutely right. Because…most of the time, we can't be bothered to ask why things work right. Most of us, we just, I mean, its not just with codes and cryptography, but with anything right, for example, how often do you ask yourself, how does a computer work, how does a recorder work, how does handphone work, how does the mp3 work? I don't think we really ask ourselves such questions right. So, that's human nature.
5. What do you think coding and cryptography would be like in the future? Why so?
Cryptography for sure, is going to be more n more important, for the simple reasons that we are likely to use more and more electronic transactions. So, because of that cryptography is going to be very, very important. Coding, er… as in like error correcting codes will continue to have at least the same importance, because again, like I said, in all these electronic transmission, we cannot do without error correcting codes to ensure the data is not corrupted too much. So, but…*coughs*…my feeling is that cryptography will grow a bit faster than coding, because I think at the moment, most practitioners, they are quite happy with the state of error correcting capabilities, but not so much for crypto.
6. Is number theory related to coding and cryptography? How so?
C = P^e mod NAbsolutely. How? Ok, erm, how, that's a good question….there is a very famous... ok, there is a very famous er……way of encryption called the RSA, I don't know if anyone of you has heard of it? Yah? It stands for …it's the initials of three people's surname, are for Rivest, S for Shamir and A for Adleman, this were three professors from MIT. I'm not sure if they are still al in MIT now. They designed the simple er... crypto system that is called a public key crypto system. Now, there are two types… have you heard of public key crypto system? *coughs*. The secret key and the public key. The secret key works in the most intuitive way, ok, like the shifting of the positions in the alphabet one position forward, you know…so for example you move the positions by one right? So then, to decrypt it, what you do is to move back, just reverse the process. Now, the public key is a totally different ball game, how does it work? Erm… *cough*, knowing how to encrypt the message, does not give you any advantage in decrypting or deciphering……er, this is highly non-intuitive right , so this is where we start to use…er rather sophisticated ideas, but very simple mathematics. There is so... this RSA erm, it works like this, I don't know, have you done modulo mathematics before? Not yet? Ok never mind, so what it does is this, it takes two huge prime numbers, p and q, ok, it takes two huge numbers, p and q, multiply them and that's called N. ok? And then what they does is this, suppose I have a message I call it P, its just a number ok, and erm, what they ……what they does is this, everyone of us in this room, suppose we want to play this game of sending messages through public key. Every one of us will make our own p and our own q, every one of us will take two big prime numbers and multiply and publish your telephone directory if you like, N. ok? Together with another number e, er... there's not much requirement for e except that p cannot divide e and q cannot divide er…no ... sorry ... except that …that GCD or the HCF, remember your HCF? It's a very strange condition, when (p-1) (q-1) and e, the GCD or the HCF is 1. Every one of us will pick…oh GCD is greatest common divisor or highest common factor. Same thing, I don't know what you call them in school this days…HCF? Well, ok, so everyone of us will pick our favourite N and e, ok N comes from multiplying two big prime numbers, and e is a number that cannot have common factors with (p-1) and (q-1), ok? Then what we do is this, we publish this, just like our phone numbers, you know, publish in the phone directory or on the internet, and you tell the whole world: “ok, if you want to send messages to me, then you use these two pieces of data.” So what they do is this, suppose I have a P which is my information, P is just…just think of it as a number, ok? You calculate P to the power of e; it's a huge number…ok? You take its remainder, where it divides by N, so never mind about this quotation, what this means is just you take this huge number, divide by N, keep this remainder…ok? Then what you do, that remainder lets call it C. you send C. I receive C. so suppose now you are going to send a message, so you calculate, calculate, you get a C you send it to me. When I receive my C, I am going to take a certain d power…ok? The D power is related to this e, ok? Only I know how to calculate it, because I m the one who chose the p and q correct or not? I m the one who chose, you cannot tell the world what your p and q are arh, if you tell them, the game is gone…ok? You can tell them N and e, so because I know my p and q, I can calculate this special d, and what I do, is when I receive my C, I am going to raise it to the d power, and I am going to keep its remainder when it divide by N…have I lost you, feel free to stop me, you know, if I lost you? Now that will give you P, now let me give you an example. *takes out blank paper* take some examples, ok? Big examples are nasty, let's take p to be 5, and q to be 7, ok in reality we don't take such small numbers; we will take 128 bits of numbers, huge numbers. Ok, so N is 35 correct? But what is (p-1)? (p-1) is 4, and (q-1) is 6, ok? So I need a number which is relatively prime to 24, er which has no common factor with them. So, it's what? 23 for example, right? But 23 is a bit big lah, you are going to raise it to 23 you know,*laughs* ok, erm yah 5. 5 is a nice one right, so let's take 5, 5 is quite small, in fact it's the smallest one. Now, it so happens that, I know but I won't tell you why, but the d in this case is also 5. Ok? If you take … e to be 5, d is also 5. It's not always equal arh, it's a total fluke. Ok, so suppose we start with the message; let's say, 2, ok? Let's say the message “F” is 2, so you are going to send it to me, but you are not going to send 2 arh, so you take 2 to the 5 and then divide by 35, keep the remainder. 2 to the power of 5 is what? 32., so 32 divided by 35 keep the remainder, that is still 32, so C is 32. Ok? C is 32. So this is the number you want to send to me, this is the encrypted form. Now I receive 32, what am I going to do? This is my secret weapon, *coughs* ok; this d is my secret weapon, so when I receive 32 I am going to raise it to the 5 th power, now this is pretty big……ok, this is pretty... pretty big*coughs* (laughs at calculator), this is not impossible, it can still be handled by calculators…….33554432, correct arh? Now I divide by 35 and I keep the remainder… (Uses computer calculator)…14…that's wrong, where did I go wrong? …… (Double checks calculator) maybe it's a rounding error...no this cannot be rounding error……arh I made a mistake… er…33445532…sorry, sorry, so this number divided by 35, is some number: *.057blahblahblah. Ok? So…yah, so the remainder only comes from this part right, so this answer is two. The remainder is 2, correct or not, the remainder is because of here right, so the question is what number divided by 35 gives you this (the decimal places) right, and its true. That means that remainder is 2, ok? I am sure you guys know programming right? Huh? That's god enough; I mean instead of getting calculators and so on, you can write a very simple program, right just do the division get rid of the integer part, and then this part you multiply by 35 to get back the remainder, ok? So, this is 2, and you see I have retrieved the original. Now this is a very, very simple technique from the RSA…and there's a big company that depend on this simple algorithm to make huge bucks, it's called RSA. This company is called RSA; it's based in the Silicon Valley . Because they discovered this algorithm, they patented it, you know and they transformed it into workable forms, and they made big bucks. Now this is the public key cryptography. Ok? Erm, now I forgot why I came all the way into the secret key and public key and all that, what is the question? ...arh, there you c, that's number theory, that's the easiest sort of number theory. But er, this people use more and more sophisticated number theory, but this is something which I am sure you can understand.
7. Following up to the previous question, what other relevant (very important) mathematical concepts linked to coding and cryptography do you think there are? In your opinion, which is the most important concept?
Oh everything…erm...really everything. Yes, combinatorics … *laughs* com-bi-na-to-rics, yah, c-o-m-b-i-n-a-t-o-r-i-c-s. It's about how you mix and match things. Ok? Erm, algebra, its very important erm… these are very important, but there are also other important theories, like probabilities. Ok? That is important to analyse the errors and so on.
8. Until now, unbreakable codes have been created by someone, and at other times broken by someone else. Which have been the more “powerful” or important since the past? Which do you think will win in the “race”, the code-makers or code-breakers? Why so?
Beats me, beats me, because er *coughs*, we continue to get surprises…in fact; one area I forgot to mention is quantum theories, quantum physics. Ever heard of quantum physics? Yea, quantum physics is used to design some of the latest idea of codes, but unfortunately quantum theory is still a theory, it's very difficult to realize it. Up to now, we have only built a quantum computer that can factorise up to 16, if you give it a number up to 16, it can factorise it, give it a bigger number it doesn't know how to factorise…factorise mean getting all the prime factors out. *Coughs* so the quantum theory, that is it so far.
*Laugh* breaking it won't make you money right, *laughs* the challenge is always to make a… a, so here when we are talking about codes, we are always talking about secret codes arh…the challenge is always to make a kind of code that no one can ever break…what is true is that up till now there is no code that people have ever designed that they can prove with 100% certainty that no one can ever break systematically……that is really the biggest challenge…all codes that people made, designed *coughs* er can be proved to be quite safe, if you make certain assumptions. So, let me give you an example, like this RSA method that I just told you about, it's incredibly simple, you know, it depends on one very important assumption…it…it assumes that factorising a number is very difficult……what do I mean by that ? Ok? If I give you a small number like 6, and ask you to factorise it. Big deal, immediately you tell me 2 times 3 *coughs*, even if I give you something bigger, you know, like 54, yes you can start peeling off the factors one by one. Ok? *coughs* But if I give you 144169 and ask you to give me all the prime factors, that's going to take you quite a bit, and it's not a coincidence, factorising a number is very difficult, *coughs* or at least most people believe that it is very difficult, in the sense that no one has been able to do it in a very smart and clever and fast way so far. Ok, now this RSA crypto system, we call it, *coughs* depends on this on assumption……that factorising a number is very difficult, because why? Everything voice down to this N, which is made up of the product of two prime numbers. If I can come up with a very fast and clever way of factorising all numbers, then the moment I see this N, I say, “oh, here are my p and q, if I know my p and q, I can calculate my (p-1) and (q-1) and that actually there a very systematic way to derive e. Ok? Once, we know “(p-1), (q-1) and e” there's a very systematic way to calculate the d. I can do it…so you know the d, if you know the d, you can do the decrypting already, ok? So, the moment you have a fast way of factorising a big number, I tell you, this RSA Company will go bankrupt overnight. So, that ...that is where the challenge lies.
9. What do codes do and how they can they improve someone's daily work?
In the obvious way, you know, the secret codes right, crypto system, er if they are easy to use, and they are efficient, and they are very safe, that will enhance our security a lot, so you can feel safe, you know, doing all your banking on the internet and not to worry about hackers and so on *coughs* or even for military purposes, you will feel a lot safer right, as your … your er secrecy is ensured and… in terms of other codes like the error correcting codes I told you about, it will arh, *coughs* have again more efficient way of detecting and correcting errors, which will lead to more…more efficient products lah.
10. In the past, are there any major historical events that are affected by codes in some way or another? What is it, and why is it affected so much by them?
Major historical events arh? *coughs* ……I am trying to think, nothing spins to my mind, (Bi Ran: maybe this country develops a sort of code and uses it to transmit secret messages over, and then they win a war because their enemies cannot understand)...oh that. That, I am sure happens most of the time, but er…you might just want to search the internet…like all those Caesar ciphers, they might have some stories that go around them…ok *coughs* Most of the time in the field of crypto, you know, any news you hear is bad news. Because when a code works well, everyone is just going to keep quiet about it but when you crack a system, when and you crack a code *coughs* then the whole world gets to hear about it. You see …erm that's the end of all that supposedly brilliant idea, so…but I think most people will just keep quiet because it doesn't do them any good what…except to…to…boost their own ego a little bit…if you can crack other people's code, what good does it do to you. ok? If you shout about it, then of course there is going to be a change in the code, and then your fun is gone, right? But if you keep quiet about it, you can continue to keep breaking it……You've run out of questions, is it? *laughs*
Do you have any other information you want to tell us about with regards to our topic?
Well, I can show you a few sites arh………I can show you how a DVD player works…this picture has a name arh, now I am going to choose the speed at which the…correction ...I am going to introduce some errors in this picture afterwards, and then I am going to correct it, now what you see is this, the ways codes are used to correct errors, is by using what we call redundant information, just like just now the yes and no code arh, when it's very plain arh, with no extra information, it's just 0 and 1, it cannot correct, but if you introduce a lot of redundant information, so I repeat the 1 several times and the 0 several times, then suddenly it ... it er gives me this extra ability to detect and correct the mistakes, so this things that you see here , these dots here, are basically the extra bits of information that we put into the picture to allow us to correct mistakes later. Ok? So, now we introduce errors, so let's take... er… a laser blade and start disfiguring her (refers to DVD picture on computer) Ok? And now, we correct, so let's… now you see er, it's doing the correction, it's cleaning up her face…you see all the scars have gone, 100% of the errors were corrected...you see that? That is how the CD and DVD works imagine the same scratch right. So, now, suppose I continue and I become a bit more vicious and disfigure her in ... in such a terrible way, right , and I am trying to give her some cosmetic surgery…look at this she's only 30% restored right, or permanently disfigured. Again, this gives you an illustration of how the DVD or CD works, if you start scratching your CD like that right, and then there will come a point where it's beyond the point of no return, so you can never retrieve that data anymore. Ok? Decode is the same, oh it asks it to clean up further……this are the errors, so this is...is an example, let's see… any other sites…these are all on error correcting codes , this is an interesting articles, ok these are not for the secret codes arh, these are not. This article might still be there...yes it tells you how it is being used in application, the key page is this, my old web page In the NUS maths website, then you go to the advanced coding theories, and there's some links. Reed Solomon code is the one used in CDs. Shannon was the one who started the error correcting codes. Hemming is an interesting guy, he also started implementing this error correcting codes, ok, he started working and…well this article is pretty wordy, so I don't think you would be so interested. But, there's this site arh, http://www.howstuffworks.com , ok there's some good information regarding coding and cryptography.