Previews Parsing

From Komic Box Docs
Revision as of 17:48, 22 October 2018 by Gskluzacek (Talk | contribs) (add old code section)

Jump to: navigation, search

Old PHP Code Base


Previews Parsing

Purpose

To take the previews order form and parse its contents into database tables

The Previews Web Site

The home page is located at previewsworld.com

They now have a digital version of previews which you can view on the web site or on a mobile app. Each issue is $3.99

Customer Order Form (COF)

The Customer Order Form (COF) can be downloaded in Text or PDF format from the Archive page. The have issues as far back as Jan 2012 on the archive page itself. However, it is posible to request the COFs between JAN 2010 and DEC 2011, by manually entering the URL.

The the URL for each COF follows the format below:

   https://www.previewsworld.com/Catalog/CustomerOrderForm/<format>/<MONYY>

where <format> is either PDF or TXT

and <MONYY> is the 3 letter month abbreviation and the 2 digit year.

Current Trends

The current trend is

  • LATIN-1 (or US-ASCII) encoding of the text
  • No blank lines before the FILE-HEADER line
  • No leading or trailing white space for the FILE-HEADER line
  • The month is spelled out
  • VOL. used as the abbreviation for Volume Number
  • Issue Number not left padded with Zeros
  • Two (2) blank lines after the FILE-HEADER line and before the PAGE line
  • No leading space for the PAGE line
  • No tailing 5 TAB characters for first PAGE line

Text Encoding

Nearly all texted format COFs are encoded with the US-ASCII (56*) encoding or LATIN-1, also known as ISO-8859-1 (43*) encoding with the following exceptions (* as of 2018/11 with 107 total issues from 2010/01 to 2018/11):

  • Windows 1252 (6*): FEB15, NOV14, AUG14, FEB14, DEC12, MAY12
  • UTF-8 with BOM (1*): DEC15
  • UTF-16 \[LE] (1*): JUN17

Note:

  • See differences between US-ASCII, LATIN-1 and WINDOWS-1256 Encodings below

File Layout General

The layout (with some exceptions) generally consists of a FILE-HEADER Line on the first line of the file, followed by some number of blank lines, followed by a PAGE line. As of 2018/11, with 119 issues from 2009/01 to 2018/11:

  • The majority (64) have 2 blank lines between the FILE-HEADER line and the first PAGE line (see item 2 below for AUG13; see item 3 below for NOV14)
  • 37 have 1 blank line (see item 2 below for JAN2013, DEC12; see item 4 below for MAR10)
  • 16 have 3 blank lines
  • 1 has 1 blank line
  • 1 did not have any FILE-HEADER line (see item 1 below for AUG17)

exceptions:

  • AUG17 - no FILE-HEADER line
  • AUG13, JAN13, DEC12 - OTHER line types after the header line but before the first PAGE line
  • NOV14 - PAGE line with with PAGE specified as 'AG' instead of 'PAGE'
  • MAR10 - 1 BLANK line before the HEADER line
FILE-HEADER Line Format

FILE-HEADER line format

  • Constant: 'PREVIEWS' starting in column 1
  • Followed by 1 space character
  • Followed by either the Month Name or Month Abbreviation (3 characters)
  • Followed by 1 space
  • Constant: either 'VOL' (with or without a trailing period) or 'V'
  • Followed by 1 space if VOL or VOL. or no spaces if V
  • Followed by the Volume Number: a 2 digit number that is equal to the issue year minus 1990 (yr - 1990 = vol_nbr)
  • Followed by 1 space
  • Followed by a pound sign '#'
  • Followed by the Issue Number (no intervening spaces): a 1 or 2 digit number, for issues in 2009, with values less than 10, are left padded with a zero (see notes below)
  • No trailing spaces

notes:

  • JUL14 - wrong issue number of 6 given, it should have been 7
  • JAN10 - wrong Month abbreviation of DEC given, it should have been JAN
  • AUG10 - wrong Month Name of JULY given, it should have been AUGUST
  • JAN12 - wrong Month Name of DECEMBER given, it should have been JANUARY
  • JAN13 - wrong Month Name of DECEMBER given, it should have been JANUARY
PAGE Line Format

PAGE line format

  • Constant: 'PAGE' starting in column 1
  • Followed by 1 space character
  • Followed by the Page Number (not left padded with zeroes)
  • Followed by either no trailing white space (first PAGE line of the file) or 5 trailing TAB characters (all other PAGE lines in the file) (see notes below)

notes:

  • JAN13 - this issue has non blank lines between the FILE-HEADER line and the PAGE line, which looks to be the reason why its first PAGE line has 5 trailing TAB characters. So its probably correct to assume that this issue is missing the first PAGE line.
  • JUL13 - the first PAGE line for this issue contains 5 trailing TAB characters.
  • NOV14 - PAGE line with with PAGE specified as 'AG' instead of 'PAGE'

File Locations

I have down loaded some of the text format COFs and have them located here JAN 2009 thru APR 2013

I also downloaded some of the PDF format COFs which are located here: JAN 2009 thru DEC 2010, SEP 2011, JAN 2012 thru OCT 2012 and JAN 2013 thru APR 2013

I also have a full compliment (both PDF and TEXT format from JAN09 to NOV18) of files on my local 5K iMac in the following directory: /Users/gregskluzacek/Documents/Development/Python/PreviewsParsing/downloads

Data Analysis

the following provides a summary of the FILE HEADER line and first PAGE line of each file

Month Year File Orig File New Encoding Blanks Before Blanks After Month Abbr 'VOL. ', 'VOL ', 'V' Iss Nbr Padded Page Hdr / Other
2018 11 NOV18_COF.txt cof_2018_11.txt LATIN-1 0 2 name VOL. (space) no
2018 10 OCT18_COF.txt cof_2018_10.txt LATIN-1 0 2 name VOL. (space) no
2018 09 SEP18_COF.txt cof_2018_09.txt LATIN-1 0 2 name VOL. (space) no
2018 08 AUG18_COF.txt cof_2018_08.txt LATIN-1 0 2 name VOL. (space) no
2018 07 JUL18_COF.txt cof_2018_07.txt LATIN-1 0 2 name VOL. (space) no
2018 06 JUN18_COF.txt cof_2018_06.txt LATIN-1 0 2 name VOL. (space) no
2018 05 MAY18_COF.txt cof_2018_05.txt LATIN-1 0 3 name VOL. (space) no
2018 04 APR18_COF.txt cof_2018_04.txt LATIN-1 0 4 name VOL. (space) no 4 blank lines after FILE HEADER line
2018 03 MAR18_COF.txt cof_2018_03.txt US-ASCII 0 3 name VOL. (space) no
2018 02 FEB18_COF.txt cof_2018_02.txt US-ASCII 0 2 name VOL. (space) no
2018 01 JAN18_COF.txt cof_2018_01.txt US-ASCII 0 2 name VOL. (space) no
2017 12 DEC17_COF.txt cof_2017_12.txt LATIN-1 0 3 name VOL. (space) no
2017 11 NOV17_COF.txt cof_2017_11.txt LATIN-1 0 2 name VOL. (space) no
2017 10 OCT17_COF.txt cof_2017_10.txt LATIN-1 0 2 name VOL. (space) no
2017 09 SEP17_COF.txt cof_2017_09.txt LATIN-1 0 2 name VOL. (space) no
2017 08 AUG17_COF.txt cof_2017_08.txt US-ASCII no file header line no file header line no file header line no file header line no file header line no file header line
2017 07 JUL17_COF.txt cof_2017_07.txt US-ASCII 0 2 name VOL. (space) no
2017 06 JUN17_COF.txt cof_2017_06.txt UTF-16-LE 0 2 name VOL. (space) no
2017 05 MAY17_COF.txt cof_2017_05.txt LATIN-1 0 3 name VOL. (space) no
2017 04 APR17_COF.txt cof_2017_04.txt LATIN-1 0 2 name VOL. (space) no
2017 03 MAR17_COF.txt cof_2017_03.txt LATIN-1 0 2 name VOL. (space) no
2017 02 FEB17_COF.txt cof_2017_02.txt LATIN-1 0 2 name VOL. (space) no
2017 01 JAN17_COF.txt cof_2017_01.txt US-ASCII 0 3 name VOL. (space) no
2016 12 DEC16_COF.txt cof_2016_12.txt US-ASCII 0 2 name VOL. (space) no
2016 11 NOV16_COF.txt cof_2016_11.txt US-ASCII 0 3 name VOL. (space) no
2016 10 OCT16_COF.txt cof_2016_10.txt LATIN-1 0 2 name VOL. (space) no
2016 09 SEP16_COF.txt cof_2016_09.txt US-ASCII 0 2 name VOL. (space) no
2016 08 AUG16_COF.txt cof_2016_08.txt US-ASCII 0 2 name VOL. (space) no
2016 07 JUL16_COF.txt cof_2016_07.txt LATIN-1 0 2 name VOL. (space) no
2016 06 JUN16_COF.txt cof_2016_06.txt US-ASCII 0 2 name VOL. (space) no
2016 05 MAY16_COF.txt cof_2016_05.txt US-ASCII 0 3 name VOL. (space) no
2016 04 APR16_COF.txt cof_2016_04.txt US-ASCII 0 3 name VOL. (space) no
2016 03 MAR16_COF.txt cof_2016_03.txt US-ASCII 0 3 name VOL. (space) no
2016 02 FEB16_COF.txt cof_2016_02.txt US-ASCII 0 3 name VOL. (space) no
2016 01 JAN16_COF.txt cof_2016_01.txt LATIN-1 0 2 name VOL. (space) no
2015 12 DEC15_COF.txt cof_2015_12.txt UTF-8-SIG 0 2 name VOL. (space) no
2015 11 NOV15_COF.txt cof_2015_11.txt US-ASCII 0 3 name VOL. (space) no
2015 10 OCT15_COF.txt cof_2015_10.txt LATIN-1 0 3 name VOL. (space) no
2015 09 SEP15_COF.txt cof_2015_09.txt LATIN-1 0 3 name VOL. (space) no
2015 08 AUG15_COF.txt cof_2015_08.txt US-ASCII 0 2 name VOL. (space) no
2015 07 JUL15_COF.txt cof_2015_07.txt US-ASCII 0 2 name VOL. (space) no
2015 06 JUN15_COF.txt cof_2015_06.txt US-ASCII 0 2 name VOL. (space) no
2015 05 MAY15_COF.txt cof_2015_05.txt LATIN-1 0 2 name VOL. (space) no
2015 04 APR15_COF.txt cof_2015_04.txt US-ASCII 0 2 name VOL. (space) no
2015 03 MAR15_COF.txt cof_2015_03.txt US-ASCII 0 2 name VOL. (space) no
2015 02 FEB15_COF.txt cof_2015_02.txt WINOWS-1252 0 2 name VOL. (space) no
2015 01 JAN15_COF.txt cof_2015_01.txt US-ASCII 0 2 name VOL. (space) no
2014 12 DEC14_COF.txt cof_2014_12.txt US-ASCII 0 2 name VOL. (space) no
2014 11 NOV14_COF.txt cof_2014_11.txt WINOWS-1252 0 2 name VOL. (space) no PAGE mis-specified as AG
2014 10 OCT14_COF.txt cof_2014_10.txt US-ASCII 0 2 name VOL. (space) no
2014 09 SEP14_COF.txt cof_2014_09.txt US-ASCII 0 1 name VOL. (space) no
2014 08 AUG14_COF.txt cof_2014_08.txt WINOWS-1252 0 2 name VOL. (space) no
2014 07 JUL14_COF.txt cof_2014_07.txt US-ASCII 0 2 name VOL. (space) no incorrect issue number of 6, should be 7
2014 06 JUN14_COF.txt cof_2014_06.txt US-ASCII 0 3 name VOL. (space) no
2014 05 MAY14_COF.txt cof_2014_05.txt LATIN-1 0 1 name VOL. (space) no
2014 04 APR14_COF.txt cof_2014_04.txt LATIN-1 0 2 abr VOL. (space) no
2014 03 MAR14_COF.txt cof_2014_03.txt US-ASCII 0 2 abr VOL. (space) no
2014 02 FEB14_COF.txt cof_2014_02.txt WINOWS-1252 0 1 abr VOL. (space) no
2014 01 JAN14_COF.txt cof_2014_01.txt US-ASCII 0 2 abr VOL. (space) no
2013 12 DEC13_COF.txt cof_2013_12.txt LATIN-1 0 2 abr VOL. (space) no
2013 11 NOV13_COF.txt cof_2013_11.txt US-ASCII 0 2 abr VOL. (space) no
2013 10 OCT13_COF.txt cof_2013_10.txt US-ASCII 0 1 abr VOL. (space) no
2013 09 SEP13_COF.txt cof_2013_09.txt US-ASCII 0 2 name VOL. (space) no
2013 08 AUG13_COF.txt cof_2013_08.txt US-ASCII 0 2 name VOL. (space) no 1 non blank line before PAGE line
2013 07 JUL13_COF.txt cof_2013_07.txt US-ASCII 0 1 name VOL. (space) no first PAGE line has 5 trailing TAB characters
2013 06 JUN13_COF.txt cof_2013_06.txt US-ASCII 0 1 name VOL. (space) no
2013 05 MAY13_COF.txt cof_2013_05.txt US-ASCII 0 2 name VOL. (space) no
2013 04 APR13_COF.txt cof_2013_04.txt US-ASCII 0 2 name VOL. (space) no
2013 03 MAR13_COF.txt cof_2013_03.txt US-ASCII 0 2 name VOL. (space) no
2013 02 FEB13_COF.txt cof_2013_02.txt US-ASCII 0 1 name VOL. (space) no
2013 01 JAN13_COF.txt cof_2013_01.txt US-ASCII 0 1 name VOL. (space) no 2 non blank lines before PAGE line; incorrect month of DECEMBER, should be JANUARY; first PAGE line has 5 trailing TAB characters
2012 12 DEC12_COF.txt cof_2012_12.txt WINOWS-1252 0 1 name VOL. (space) no 25 non blank lines before PAGE line
2012 11 NOV12_COF.txt cof_2012_11.txt US-ASCII 0 1 name VOL. (space) no
2012 10 OCT12_COF.txt cof_2012_10.txt US-ASCII 0 1 name VOL. (space) no
2012 09 SEP12_COF.txt cof_2012_09.txt US-ASCII 0 1 name VOL. (space) no
2012 08 AUG12_COF.txt cof_2012_08.txt US-ASCII 0 1 name VOL. (space) no
2012 07 JUL12_COF.txt cof_2012_07.txt US-ASCII 0 3 name VOL. (space) no
2012 06 JUN12_COF.txt cof_2012_06.txt US-ASCII 0 2 name VOL. (space) no
2012 05 MAY12_COF.txt cof_2012_05.txt WINOWS-1252 0 2 name VOL. (space) no
2012 04 APR12_COF.txt cof_2012_04.txt US-ASCII 0 2 name VOL. (space) no
2012 03 MAR12_COF.txt cof_2012_03.txt LATIN-1 0 2 name VOL. (space) no
2012 02 FEB12_COF.txt cof_2012_02.txt LATIN-1 0 1 name VOL. (space) no
2012 01 JAN12_COF.txt cof_2012_01.txt LATIN-1 0 2 name VOL. (space) no incorrect month of DECEMBER, should be JANUARY
2011 12 DEC11_COF.txt cof_2011_12.txt LATIN-1 0 1 name VOL. (space) no
2011 11 NOV11_COF.txt cof_2011_11.txt US-ASCII 0 2 name VOL. (space) no
2011 10 OCT11_COF.txt cof_2011_10.txt LATIN-1 0 2 name VOL. (space) no
2011 09 SEP11_COF.txt cof_2011_09.txt US-ASCII 0 3 name VOL. (space) no
2011 08 AUG11_COF.txt cof_2011_08.txt LATIN-1 0 2 name VOL. (space) no
2011 07 JUL11_COF.txt cof_2011_07.txt LATIN-1 0 2 name VOL. (space) no
2011 06 JUN11_COF.txt cof_2011_06.txt US-ASCII 0 2 name VOL. (space) no
2011 05 MAY11_COF.txt cof_2011_05.txt US-ASCII 0 1 name VOL. (space) no
2011 04 APR11_COF.txt cof_2011_04.txt LATIN-1 0 1 name VOL. (space) no
2011 03 MAR11_COF.txt cof_2011_03.txt LATIN-1 0 1 name VOL. (space) no
2011 02 FEB11_COF.txt cof_2011_02.txt LATIN-1 0 1 name VOL. (space) no
2011 01 JAN11_COF.txt cof_2011_01.txt LATIN-1 0 1 name VOL. (space) no
2010 12 DEC10_COF.txt cof_2010_12.txt LATIN-1 0 1 name V (no space) no
2010 11 NOV10_COF.txt cof_2010_11.txt LATIN-1 0 2 name V (no space) no
2010 10 OCT10_COF.txt cof_2010_10.txt US-ASCII 0 1 name V (no space) no
2010 09 SEP10_COF.txt cof_2010_09.txt LATIN-1 0 2 name V (no space) no
2010 08 AUG10_COF.txt cof_2010_08.txt US-ASCII 0 2 name V (no space) no incorrect month of JULY, should be AUGUST
2010 07 JUL10_COF.txt cof_2010_07.txt LATIN-1 0 2 name V (no space) no
2010 06 JUN10_COF.txt cof_2010_06.txt US-ASCII 0 2 name V (no space) no
2010 05 MAY10_COF.txt cof_2010_05.txt US-ASCII 0 2 name V (no space) no
2010 04 APR10_COF.txt cof_2010_04.txt LATIN-1 0 1 abr V (no space) no
2010 03 MAR10_COF.txt cof_2010_03.txt US-ASCII 1 1 abr V (no space) no 1 blank line before FILE HEADER line
2010 02 FEB10_COF.txt cof_2010_02.txt LATIN-1 0 1 abr V (no space) no
2010 01 JAN10_COF.txt cof_2010_01.txt LATIN-1 0 1 abr V (no space) no incorrect month abbr of DEC, should be JAN
2009 12 DEC09_COF.txt cof_2009_12.txt unknown 0 1 abr VOL (space no period) no
2009 11 NOV09_COF.txt cof_2009_11.txt unknown 0 2 abr VOL (space no period) no
2009 10 OCT09_COF.txt cof_2009_10.txt unknown 0 1 abr VOL (space no period) no
2009 09 SEP09_COF.txt cof_2009_09.txt unknown 0 1 abr VOL (space no period) yes
2009 08 AUG09_COF.txt cof_2009_08.txt unknown 0 1 abr VOL (space no period) yes
2009 07 JUL09_COF.txt cof_2009_07.txt unknown 0 1 abr VOL (space no period) yes
2009 06 JUN09_COF.txt cof_2009_06.txt unknown 0 1 abr VOL (space no period) yes
2009 05 MAY09_COF.txt cof_2009_05.txt unknown 0 1 abr VOL (space no period) yes
2009 04 APR09_COF.txt cof_2009_04.txt unknown 0 1 abr VOL (space no period) yes
2009 03 MAR09_COF.txt cof_2009_03.txt unknown 0 1 abr VOL (space no period) yes
2009 02 FEB09_COF.txt cof_2009_02.txt unknown 0 1 abr VOL (space no period) yes
2009 01 JAN09_COF.txt cof_2009_01.txt unknown 0 1 abr VOL (space no period) yes

High Level Functions

File Loader

Main

set the base URL and Path to store the downloaded files

   url_base = 'https://www.previewsworld.com/Catalog/CustomerOrderForm/{}/{}'
   path_name = '/Users/gregskluzacek/Documents/Development/Python/PreviewsParsing/downloads'

Set the latest date (current month) to process and the number of months to process (working backwards).

   beg_dt = datetime(2018, 11, 1)
   months = 108

Iterate over each month

   for i in range(1, months + 1):

set the date string of the URL

       dt_str = beg_dt.strftime('%b%y').upper()

set the URLs for the TEXT and PDF formates

       url = url_base.format('TXT', dt_str, True)
       url = url_base.format('PDF', dt_str, True)

in separate try except blocks call

       get_text_file(url, path_name)
       get_pdf_file(url, path_name)

at the end of the loop, decrement the date variables

       beg_dt = beg_dt - relativedelta(months=1)

get_file_name

Passed a requests request-response object, and tries to get the Content-Disposition header from the response

   hdr = r.headers['Content-Disposition']

The Content-Disposition header has the format of: "attachment; filename=MMMYY_COF.ext" where:

  • MMM - is the 3 letter Month Abbreviation,
  • YY - is the 2 digit year, and
  • ext - is the file extension, either txt or pdf

Parse the 'attachment;' portion of the header and validate that its does equal 'attachment'

   vals_lvl_1 = hdr.split(';')
   if len(vals_lvl_1) != 2 or vals_lvl_1[0] != 'attachment':
       raise Exception(...)

Parse the filename key value and validate the key does equal 'filename'

   vals_lvl_2 = vals_lvl_1[1].split('=')
   if len(vals_lvl_2) != 2 or vals_lvl_2[0].strip() != 'filename':
       raise Exception(...)

Get the first 3 characters (the Month Abbreviation) and try to convert it to a valid numeric month

   orig_file = vals_lvl_2[1]
   month_txt = orig_file[:3]
   try:
       month_num = time.strptime(month_txt, '%b').tm_mon
   except ValueError as err:
       raise Exception(...)

1) Convert the numeric month into a left padded, zero filled string (e.g., 01, 02, 03... 09, 10, 11, 12) -- 2) add the century onto the beginning of the 2 digit year -- 3) and get the file extension (txt or pdf)

   mm = str(month_num).zfill(2)
   yyyy = '20' + orig_file[3:5]
   ext = orig_file[-3:]

Reformat & return the filename as: "cof_YYYY_MM.ext"

   return 'cof_{}_{}.{}'.format(yyyy, mm, ext)

get_text_file

pass in: url, file_path, validate_only=False

create a get request to fetch the file

   with requests.get(url) as req:

Checks to make sure the Content-Type is plain/text

       if req.headers.get('Content-Type') != 'text/plain':
           raise Exception(...)

sets the request encoding to the Apparent Encoding (The apparent encoding, provided by the chardet library)

       req.encoding = req.apparent_encoding

Validates text is greater than 100 characters and validates the first line or first and second lines

       if len(text) < 100:
           raise Exception("...)
       if req.text[:8] != 'PREVIEWS' and req.text[:4] != 'PAGE' and req.text[2:10] != 'PREVIEWS':
           raise Exception(...)

Save the file if the validate_only flag is set to false:

       if not validate_only:
           fn = path.join(file_path, get_file_name(req))
           with open(fn, 'wt', encoding='UTF-8') as fh:
               fh.write(text)

get_pdf_file

pass in: url, file_path, validate_only=False

create a get request to fetch the file

   with requests.get(url) as req:

Checks to make sure the Content-Type is application/pdf

       if req.headers.get('Content-Type') != 'application/pdf':
           raise Exception(...)

Validates text is greater than 100 characters and validates the first line

       if len(text) < 100:
           raise Exception("...)
       if req.content[:4] != b'%PDF':
           raise Exception(...)

Save the file if the validate_only flag is set to false:

       if not validate_only:
           fn = path.join(file_path, get_file_name(req))
           with open(fn, 'wb') as fh:
               fh.write(req.content)

Parsing of Loaded Data

Differences Between Encodings

US-ASCII

Basic character set which uses codes from hex 00 thru 7F. Codes between hex 80 and FF are undefined

HEX Char Description
00 NUL Null
01 SOH Start Of Heading
02 STX Start Of Text
03 ETX End Of Text
04 EOT End of Transmission
05 ENQ Enquiry
06 ACK Acknowledgement
07 BEL Bell
08 BS Backsapce
09 HT Horizontal Tab
0A LF Line Feed
0B VT Vertical Tab
0C FF Form Feed
0D CR Carriage Return
E0 SO Shift Out
0F SI Shift In
10 DLE Data Link Escape
11 DC1 Xon (device control 1)
12 DC2 (device control 2)
13 DC3 Xoff (device control 3)
14 DC4 (device control 4)
15 NAK Negative Acknowledgement
16 SYN Synchronous Idle
17 ETB End Of Transmission Block
18 CAN Cancel
19 EM End Of Medium
1A SUB Substitute
1B ESC Escape
1C FS File separator
1D GS Group Separator
1E RS Record Separator
1F US Unit Separator
20 SP Space
21  ! Exclamation Point
22 " Double Quote
23 # Pound Sign
24 $ Dollar Sign (currency)
25  % Per-Cent
26 & Ampersand
27 ' Single Quote (Apostrophe)
28 ( Parentheses Left
29 ) Parentheses Right
2A * Asterisk
2B + Plus Sign
2C , Comma
2D - Dash or Minus Sign (math)
2E . Period
2F / Forward Slash
30 0
31 1
32 2
33 3
34 4
35 5
36 6
37 7
38 8
39 9
3A  : Colon
3B  ; Semi Colon
3C < Less Than Sign (math)
3D = Equal Sign (math)
3E > Greater Than Sign (math)
3F  ? Question Mark
40 @ At Sign (at the rate of)
41 A
42 B
43 C
44 D
45 E
46 F
47 G
48 H
49 I
4A J
4B K
4C L
4D M
4E N
4F O
50 P
51 Q
52 R
53 S
54 T
55 U
56 V
57 W
58 X
59 Y
5A Z
5B [ Square bracket Left
5C \ Backslash
5D ] Square bracket Right
5E ^ Caret
5F _ Underscore
60 ` Grave Accent
61 a
62 b
63 c
64 d
65 e
66 f
67 g
68 h
69 i
6A j
6B k
6C l
6D m
6E n
6F o
70 p
71 q
72 r
73 s
74 t
75 u
76 v
77 w
78 x
79 y
7A z
7B { Curly Brace Left
7C Pipe
7D } Curly Brace Right
7E ~ Tilde
7F DEL Delete

LATIN-1

Also known as ISO-8859-1, extends the US-ASCII encoding by adding additional characters from hex A0 thru FF

HEX Char Description
A0 NBSP Non Breaking Space
A1 ¡ Inverted Exclamation Point
A2 ¢ Cent Sign (currency)
A3 £ Pound Sign (currency)
A4 ¤ Unspecified Currency Sign
A5 ¥ Yen Sign (currency)
A6 ¦ Vertical Bar
A7 § Section Sign
A8 ¨ Diaeresis
A9 © Copyright Symbol
AA ª Ordinal indicator
AB «  Angle Quote Double Left
AC ¬ Negation (Logical Compliment)
AD SHY Soft Hyphen
AE ® Registered Trademark Symbol
AF ¯ Macron
B0 ° Degree Symbol
B1 ± Plus Minus Symbol
B2 ² Superscript 2
B3 ³ Superscript 3
B4 ´ Acute Accent
B5 µ Micro
B6 Paragraph Mark
B7 · Interpunct (Centered Dot)
B8 ¸ Cedilla
B9 ¹ Superscript 1
BA º Ordinal indicator
BB  » Angle Quote Double Right
BC ¼ Fraction One Quarter
BD ½ Fraction One Half
BE ¾ Fraction Three Quareters
BF ¿ Inverted Question Mark
C0 À A Grave (Upper Case)
C1 Á A Acute (Upper Case)
C2 Â A Circumflex (Upper Case)
C3 Ã A Tilde (Upper Case)
C4 Ä A Diaeresis (Upper Case)
C5 Å A Overring (Upper Case)
C6 Æ AE Ligature (Upper Case)
C7 Ç C Cedilla (Upper Case)
C8 È E Grave (Upper Case)
C9 É E Acute (Upper Case)
CA Ê E Circumflex (Upper Case)
CB Ë E Diaeresis (Upper Case)
CC Ì I Grave (Upper Case)
CD Í I Acute (Upper Case)
CE Î I Circumflex (Upper Case)
CF Ï I Diaeresis (Upper Case)
D0 Ð Eth or EDH (TH) (Upper Case)
D1 Ñ N Tilde (Upper Case)
D2 Ò O Grave (Upper Case)
D3 Ó O Acute (Upper Case)
D4 Ô O Circumflex (Upper Case)
D5 Õ O Tilde (Upper Case)
D6 Ö O Diaeresis (Upper Case)
D7 × Multiplication Sign (math) (Upper Case)
D8 Ø O vowel (foreign) (Upper Case)
D9 Ù U Grave (Upper Case)
DA Ú U Acute (Upper Case)
DB Û U Circumflex (Upper Case)
DC Ü U Diaeresis (Upper Case)
DD Ý Y Acute (Upper Case)
DE Þ Thorn (TH)
DF ß Eszett (German)
E0 à A Grave (lower case)
E1 á A Acute (lower case)
E2 â A Circumflex (lower case)
E3 ã A Tilde (lower case)
E4 ä A Diaeresis (lower case)
E5 å A Overring (lower case)
E6 æ AE Ligature (lower case)
E7 ç C Cedilla (lower case)
E8 è E Grave (lower case)
E9 é E Acute (lower case)
EA ê E Circumflex (lower case)
EB ë E Diaeresis (lower case)
EC ì I Grave (lower case)
ED í I Acute (lower case)
EE î I Circumflex (lower case)
EF ï I Diaeresis (lower case)
F0 ð Eth or EDH (TH) (lower case)
F1 ñ N Tilde (lower case)
F2 ò O Grave (lower case)
F3 ó O Acute (lower case)
F4 ô O Circumflex (lower case)
F5 õ O Tilde (lower case)
F6 ö O Diaeresis (lower case)
F7 ÷ Division Sign (math) (lower case)
F8 ø O vowel (foreight) (lower case)
F9 ù U Grave (lower case)
FA ú U Acute (lower case)
FB û U Circumflex (lower case)
FC ü U Diaeresis (lower case)
FD ý Y Acute (lower case)
FE þ Thorn (TH) (lower case)
FF ÿ Y Diaeresis (lower case)

WINDOWS-1552

Further extends the US-ASCII enconding by adding 27 additional characters between hex 80 thru 9F (5 characters remain undefined).

HEX Char Description
80 Euro Sign
82 Smart Quote Single (low)
83 ƒ Forin
84 Smart Quote Double (low)
85 Elipse
86 Dagger Single Cross
87 Dagger Double Cross
88 ˆ Circumflex
89 Per-Mili (like Per-Cent)
8A Š
8B Angle Quote Single Left
8C Œ OE ligagure (upper case)
8E Ž
91 Smart Quote Single Left
92 Smart Quote Single Right
93 Smart Quote Duble Left
94 Smart Quote Duble Right
95 Bullet
96 Dash (longer thicker)
97 Dash (longer thicker)
98 ˜ Tilde
99 Trade Mark
9A š
9B Angle Quote Single Right
9C œ OE ligagure (lower case)
9E ž
9F Ÿ