Friday, February 11, 2011

XML Encoding Issues

Thought it might be worth to document my experience today since I tend to forget most of the solutions for my problems.

It is important to be aware of XML limitations specially when it relates to encoding. When using UTF-8 since its only an 8 bit character it will not convert a lot of characters specially the ones coming from ISO-8859-1 html enitities. You will keep getting errors in your XML!! One solution is to change the encoding of your XML to match your encodings and the other is to convert these characters to UNICODE. I chose the latter solution.

Here's a usefull script I plucked in from PHP.net that will convert html entities to UNICODE (credits to php.net):

    public static function htmlentitiesToUnicode($input)
    {
      $htmlEntities = array_values(get_html_translation_table(HTML_ENTITIES, ENT_QUOTES));
      $entitiesDecoded = array_keys(get_html_translation_table(HTML_ENTITIES, ENT_QUOTES));
     
      $num = count ($entitiesDecoded);
      for ($u = 0; $u < $num; $u++) {
        $utf8Entities[$u] = '&#'.ord($entitiesDecoded[$u]).';';
      }
     
      return str_replace ($htmlEntities, $utf8Entities, htmlentities($input, HTML_ENTITIES));
    }

Another problem I encountered is within the transmission and reading of the XML data. I used PHP's SimpleXML to convert my XML to objects. One problem SimpleXML is giving me is eventhough the html entities are converted to unicodes, it is giving out jibberish characters as output! 

Heres a sample of what i am talking about:
Original encoding: résumés
UNICODE encoding (XML): r&#233;sum&#233;s
Jibberish SimpleXML translation: résumés

fortunately there is a simple solution for this problem....iconv()
just use
$val = iconv('UTF-8', 'ISO-8859-1', $val);

and hopefully that should solve your encoding problems.

1 comment:

Unknown said...

Sergei , October 22, 1970 Born in Christian Louboutin Bois Dore Moscow State Ozherelye. Perhaps influenced by Cheap LV Handbags their parents worship warrior army, join the army in cheap jordans 1989, and into the elite airborne military service. 1991, by the influence of the Soviet Union, the uggs on sale Russian christian louboutin army general lack of funds, a large number ugg australia of Air Jordan 11 Gamma Blue soldiers seeking veterans, but under extremely difficult conditions still dedicated, praise superiors, and soon was sent to the famous christian louboutin shoes Ryazan Higher ugg Airborne Command School studies. After graduating in 1994, entered the famous Pskov 76th Guards Airborne discount christian louboutin Division, served as discount nike jordans the reconnaissance platoon, reconnaissance ugg soldes deputy company commander, company commander, battalion airborne regiment scouting director and other duties.From 2000 to Cheap Louis Vuitton Handbags 2004, where the forces have repeatedly ordered war with Chechen militants. Because of their opponents in order to form the squad activity and haunted impermanence, the Russians had to ugg boots mobilize elite troops set up large uggs outlet number of small units, "a Discount Louis Vuitton small wholesale jordan shoes play small" Implementation siege. commanding troops in Chechnya, Christian Louboutin Daffodile Ingushetia and other places over the mountains, to track the militants fled. Discount LV Handbags Although the militants were extremely vigilant, but as long as the task has never had outsmarted.August 8, 2008, Georgia suddenly invade South Ossetia (Russian peacekeepers stationed ugg pas cher there), the Russian military to respond quickly, including including the 76th Guards Airborne Division, more than christian louboutin remise 50% 3,000 people were immediately delivered to the pro-Russian Abkhazia, Georgia cheap nike jordan shoes Army contain two infantry brigades, effectively coordinate the cheap christian louboutin direction of the Russian Bags Louis Vuitton troops in South Ossetia.