The fact that MS-word and some other sources use CP-1252, and that it is so close to Latin1 ('ISO-8859-1') causes a lot of confusion. What confused me the most was finding that mySQL uses CP-1252 by default.
You may run into trouble if you find yourself tempted to do something like this:
$trans[chr(149)] = '•'; $trans[chr(150)] = '–'; $trans[chr(151)] = '—'; $trans[chr(152)] = '˜'; $trans[chr(153)] = '™'; ?>
Don't do it. DON'T DO IT!
You can use:
$translationTable = get_html_translation_table(HTML_ENTITIES, ENT_NOQUOTES, 'WINDOWS-1252');
or just convert directly:
$output = htmlentities($input, ENT_NOQUOTES, 'WINDOWS-1252');
But your web page is probably encoded UTF-8, and you probably don't really want CP-1252 text flying around, so fix the character encoding first:
$output = mb_convert_encoding($input, 'UTF-8', 'WINDOWS-1252');
$ouput = htmlentities($output);