------------------------------------------------------------------------ iconv ------------------------------------------------------------------------ The original coding was ASCII (mapped from 0 to 127) was developed for teletype (actually for telegraph). A single byte was sufficient (with the upper half unused). Later IBM developed EBCDIC (also 1 byte) which did not include ASCII as a subset. There is now a need to encode other languagees. All this has led to a large number of character coding. "iconv" allows you to map one coding to another. the number of code schemes that iconv can handle is given by $ iconv -l #you should get impressed The general form of usage is $ iconv -f old-encoding -t new-encoding Infile > Outfile The only use I have made of iconv is to get rid of all characters other than "lower half of ASCII". $ iconv -t ASCII -c Infile > Outfile -c ... characters which cannot be converted are silently discarded -t ... restrict the coding to standard ASCII There are other solutions to this task $ LC_ALL=C tr -dc '\0-\177' < input_file > output_file note; no ";" after LC_ALL=C, even though you may be tempted to do so. To learn more about "locale" setting $ locale BACKGROUND: The first set of character coding was ASCII (American Standard Code for Information Interchange) and traces its root back to days of telegraphy. It had only had 128 characters mapped to [0,127]. $ man ascii The first 32 characters (0-31) are "control" characters and the last character (126) are control characters and trace their heritage to days of telegraphy (and teletype machines). Of these the following are still in use: 0 (NUL,\0,^@) marks the end of a string 7 (BEL, \a, ^G) bell 8 (BS, \b, ^H) backspace 9 (HT, \t, ^I) Horizontal tab 10 (LF, \n, ^J) Linefeed 11 (VT, \v, ^K) vertical tab 12 (FF, \f, ^L) form freed (eject paper, clear screen) 13 (CR, \r, ^M) carriage return 26 (Control-Z, EOF, ^Z) 27 (escape, ESC, ^[) 127 (delete, DEL, ^?) Characters with values between 32 and 126 are used for symbols.