<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Bart van Raaij</title>
    <description>The latest articles on Forem by Bart van Raaij (@bartvanraaij).</description>
    <link>https://forem.com/bartvanraaij</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F438590%2F21d0a233-bb01-4289-936b-a5d1667da231.jpg</url>
      <title>Forem: Bart van Raaij</title>
      <link>https://forem.com/bartvanraaij</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/bartvanraaij"/>
    <language>en</language>
    <item>
      <title>Converting UTF-8 strings to ASCII using the ICU Transliterator</title>
      <dc:creator>Bart van Raaij</dc:creator>
      <pubDate>Sat, 17 Oct 2020 17:31:30 +0000</pubDate>
      <link>https://forem.com/bartvanraaij/converting-utf-8-strings-to-ascii-using-the-icu-transliterator-704</link>
      <guid>https://forem.com/bartvanraaij/converting-utf-8-strings-to-ascii-using-the-icu-transliterator-704</guid>
      <description>&lt;p&gt;With the general availability and widespread support of UTF-8, character encoding issues are thankfully becoming a problem of the past. But unfortunately there are still tons of legacy systems out there that don't support it.&lt;/p&gt;

&lt;p&gt;I ran into this exact problem quite recently. I had built a "Book an appointment" form for a client. All user input, including the customer's name, is sent to the client's legacy CRM via a proprietary HTTP API.  It turned out that said CRM only accepts &lt;a href="https://en.wikipedia.org/wiki/ASCII"&gt;ASCII&lt;/a&gt; ☹️. That's right: &lt;em&gt;Just&lt;/em&gt; ASCII, not even &lt;a href="https://en.wikipedia.org/wiki/Extended_ASCII"&gt;Extended ASCII&lt;/a&gt;. Any attempt to send a string with non-ASCII characters resulted in an HTTP 400-error. That meant that people with names like Bjørn or François couldn't use that form — because those names contain non-ASCII characters.  Naturally, it is not acceptable by any means to exclude Bjørn and François from using our form just because their names contain letters that don't appear on a &lt;a href="https://en.wikipedia.org/wiki/Teletype_Model_33"&gt;1960s teletypewriter&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;I consulted with the client but sadly the problem couldn't (or wouldn't) be fixed on their end, and they asked if I could provide a solution. So I needed to come up with a way to transform or convert the user's input into ASCII.&lt;/p&gt;

&lt;h2&gt;
  
  
  The desired result
&lt;/h2&gt;

&lt;p&gt;First, let's define what the actual desired result is. I'll be using this fictitious name:  &lt;code&gt;Daniël Renée François Bjørn in’t Veld&lt;/code&gt;.  Every word in this string has a non-ASCII character. If we need to convert this string to ASCII, we should find characters that look similar. To be precise, I want the end result to be: &lt;code&gt;Daniel Renee Francois Bjorn in't Veld&lt;/code&gt;.  In my opinion that is as close as we can get.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;At this point I want to stress that if you have a viable way to refrain from having to convert user input (e.g., someone's name), you absolutely should!&lt;/strong&gt;&lt;br&gt;
In other words: if someone is called Bjørn, please go out of your way to make sure your systems call them Bjørn. Someone's name is part of their identity and not something you want to mess up. I for one already get annoyed when a system autocapitalises my surname into "Van Raaij". Imagine my frustration if I were to be called "B@rt" just because a system doesn't have the &lt;code&gt;a&lt;/code&gt; character in their character set. &lt;/p&gt;

&lt;p&gt;That being said: given the choice between a) not being able to use a form or service at all or b) being called Bjorn, I'm sure that Bjørn would choose the latter.&lt;/p&gt;

&lt;p&gt;Enough talk, let's code! Converting a UTF-8 string to ASCII can't be hard, right?&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Note: I'll be using PHP, but the examples are applicable to other languages as well.&lt;/em&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  The obvious choice: iconv
&lt;/h2&gt;

&lt;p&gt;If you search for &lt;em&gt;php utf8 to ascii&lt;/em&gt; &lt;a href="https://www.php.net/manual/en/function.iconv.php"&gt;&lt;code&gt;iconv&lt;/code&gt;&lt;/a&gt; is the first function that pops up:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;iconv — Convert string to requested character encoding  &lt;/p&gt;

&lt;p&gt;&lt;code&gt;iconv ( string $in_charset , string $out_charset , string $str ) : string&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Performs a character set conversion on the string &lt;code&gt;str&lt;/code&gt; from &lt;code&gt;in_charset&lt;/code&gt; to &lt;code&gt;out_charset&lt;/code&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;As the &lt;a href="https://www.php.net/manual/en/function.iconv.php"&gt;documentation&lt;/a&gt; states, there are three 'modes' in which iconv can operate: &lt;em&gt;plain&lt;/em&gt;, &lt;em&gt;IGNORE&lt;/em&gt; and &lt;em&gt;TRANSLIT&lt;/em&gt;. Let's not waste any time and put it to the test:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="cp"&gt;&amp;lt;?php&lt;/span&gt;
&lt;span class="nv"&gt;$name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'Daniël Renée François Bjørn in’t Veld'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="nv"&gt;$plain&lt;/span&gt;    &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;iconv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"UTF-8"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"ASCII"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;$name&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="nv"&gt;$ignore&lt;/span&gt;   &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;iconv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"UTF-8"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"ASCII//IGNORE"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;$name&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="nv"&gt;$translit&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;iconv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"UTF-8"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"ASCII//TRANSLIT"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;$name&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="nb"&gt;var_dump&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$plain&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;$ignore&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;$translit&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://3v4l.org/RREJl"&gt;Run this code example on 3v4l.org »&lt;/a&gt; &lt;/p&gt;

&lt;p&gt;Output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Notice: iconv(): Detected an illegal character in input string in /in/RREJl on line 4
bool(false)
string(32) "Danil Rene Franois Bjrn int Veld"
string(37) "Dani?l Ren?e Fran?ois Bj?rn in't Veld"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Well, that's disappointing:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The &lt;em&gt;plain&lt;/em&gt; mode triggered an &lt;code&gt;E_NOTICE&lt;/code&gt; and returned &lt;code&gt;false&lt;/code&gt;. It means that iconv detected one or  more characters that it couldn't fit into the output charset, and it just gave up;&lt;/li&gt;
&lt;li&gt;The &lt;em&gt;IGNORE&lt;/em&gt; mode simply discarded the characters it couldn't fit into ASCII;&lt;/li&gt;
&lt;li&gt;The &lt;em&gt;TRANSLIT&lt;/em&gt; mode tried to replace the non-ASCII characters with similarly looking ASCII characters, but failed. Except for &lt;code&gt;’&lt;/code&gt; — the
&lt;a href="https://www.compart.com/en/unicode/U+2019"&gt;Right Single Quotation Mark&lt;/a&gt;, which is not uncommon in
Dutch surnames — they're all replaced by a question mark.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The PHP docs warn that this may happen: &lt;em&gt;"TRANSLIT conversion is likely to fail for characters which are illegal for the out_charset."&lt;/em&gt; And if you read the comments in the documentation you'll find that iconv's &lt;em&gt;TRANSLIT&lt;/em&gt; mode behaves very inconsistently between different systems. So apparently we can't rely on iconv's &lt;em&gt;TRANSLIT&lt;/em&gt; mode at all.&lt;/p&gt;

&lt;p&gt;Technically I could've used the &lt;em&gt;IGNORE&lt;/em&gt; mode of iconv and be done with it. It doesn't contain any non-ASCII characters anymore so my API call wouldn't fail anymore. But it's not the result I set out for. Again: if my name is Bjørn, I want to be called Bjørn, I can live with "Bjorn" but not "Bjrn" and certainly not "Bj?rn".&lt;/p&gt;

&lt;h2&gt;
  
  
  Transliteration
&lt;/h2&gt;

&lt;p&gt;Although iconv's &lt;em&gt;TRANSLIT&lt;/em&gt; mode doesn't seem usable, I feel we are on the right track with &lt;em&gt;transliteration&lt;/em&gt;. So what exactly is transliteration?&lt;/p&gt;

&lt;p&gt;Transliteration, in the general sense of the word, is "conversion of a text from one script to another that involves swapping letters in predictable ways" (&lt;a href="https://en.wikipedia.org/wiki/Transliteration"&gt;Wikipedia&lt;/a&gt;). It is, for example, the conversion of &lt;code&gt;Игорь Стравинский&lt;/code&gt; (Cyrillic script) to &lt;code&gt;Igor Stravinsky&lt;/code&gt; (Latin script). &lt;/p&gt;

&lt;p&gt;Now think of a character set as a script, and immediately it makes sense to use transliteration to convert text from one character set to another. The character &lt;code&gt;ø&lt;/code&gt; is in the UTF-8 'script' but not in ASCII. Transliterating UTF-8 to ASCII would mean to find an ASCII-character that represents that character as good as possible.&lt;/p&gt;

&lt;p&gt;Is it possible to perform these kinds of transliteration programmatically? Yes, it is!&lt;/p&gt;

&lt;h2&gt;
  
  
  International Components for Unicode (ICU)
&lt;/h2&gt;

&lt;p&gt;Enter &lt;em&gt;ICU&lt;/em&gt;. The &lt;a href="https://unicode-org.github.io/icu/userguide/icufaq/#what-is-icu"&gt;&lt;em&gt;International Components for Unicode&lt;/em&gt;&lt;/a&gt; constitute a "cross-platform Unicode based globalisation library" with components for "locale-sensitive string comparison, date/time/number/currency/message formatting, text boundary detection, character set conversion and so on". It's built and provided by the &lt;a href="https://github.com/unicode-org"&gt;Unicode Consortium&lt;/a&gt; as C/C++ and Java libraries, but wrappers exist for &lt;a href="http://site.icu-project.org/related"&gt;plenty of other languages&lt;/a&gt;, including PHP. In PHP it's better known as the &lt;a href="https://www.php.net/manual/en/intro.intl.php"&gt;Internationalization extension&lt;/a&gt;, or &lt;code&gt;ext-intl&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Speaking of which, this sentence on the ICU Related Projects page made me smile: &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"The upcoming PHP 6 language is expected to support Unicode through ICU4C". &lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;As you may know PHP 6 &lt;a href="https://ma.ttias.be/php6-missing-version-number/"&gt;never saw the light of day&lt;/a&gt; but it &lt;em&gt;did&lt;/em&gt; &lt;a href="https://www.phproundtable.com/episode/what-happened-to-php-6"&gt;lay the groundwork for the intl extension&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;I could probably write a blog post for each and every component in the ICU library (I find internationalisation mighty interesting), but let's focus and see if the ICU Transliterator can help us in our quest to correctly converting UTF8 to ASCII.&lt;/p&gt;

&lt;h2&gt;
  
  
  Using the ICU Transliterator
&lt;/h2&gt;

&lt;p&gt;Let's dive right in. The PHP function we're looking for is &lt;a href="https://www.php.net/manual/en/transliterator.transliterate.php"&gt;&lt;code&gt;transliterator_transliterate&lt;/code&gt;&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;transliterator_transliterate — Transliterate a string&lt;/p&gt;

&lt;p&gt;&lt;code&gt;transliterator_transliterate ( mixed $transliterator , string $subject [, int $start [, int $end ]] )&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Transforms a string or part thereof using an ICU transliterator.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;em&gt;Note: I'm using the procedural function here for brevity, but PHP also provides a &lt;code&gt;Transliterator&lt;/code&gt;  class.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The function call looks pretty straightforward at first, but the &lt;code&gt;$transliterator&lt;/code&gt; parameter is where it gets a bit tricky. The docs are fairly brief and don't give much guidance, but fortunately the &lt;a href="https://unicode-org.github.io/icu/userguide/transforms/general/#icu-transliterators"&gt;ICU docs provide some insights&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Latin-ASCII: Converts non-ASCII-range punctuation, symbols, and Latin letters in an approximate ASCII-range equivalent.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Jackpot? Let's try!&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="cp"&gt;&amp;lt;?php&lt;/span&gt;
&lt;span class="nv"&gt;$name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'Daniël Renée François Bjørn in’t Veld'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="nv"&gt;$translitRules&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'Latin-ASCII'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="nv"&gt;$nameAscii&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;transliterator_transliterate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$translitRules&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;$name&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="nb"&gt;var_dump&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$nameAscii&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://3v4l.org/ck1jT"&gt;Run this code example on 3v4l.org »&lt;/a&gt; &lt;/p&gt;

&lt;p&gt;Output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;string(37) "Daniel Renee Francois Bjorn in't Veld"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it 👏 🥳! The ICU Transliterator produced our exact desired output! No warnings, errors or unexpected side effects. Mission accomplished! &lt;/p&gt;

&lt;h2&gt;
  
  
  Real transliteration
&lt;/h2&gt;

&lt;p&gt;Or is it? Remember Igor Stravinsky? What if he was to use my form and entered his name in Cyrillic script instead of Latin? With our current implementation, this won't work, the output will simply be &lt;code&gt;Игорь Стравинский&lt;/code&gt;. &lt;br&gt;
This is because we only told the transliterator to convert Latin characters to ASCII, so it will leave the Cyrillic characters unaffected. However, we can apply multiple transliteration rules, like so:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="cp"&gt;&amp;lt;?php&lt;/span&gt;
&lt;span class="nv"&gt;$name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'Игорь Стравинский'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="nv"&gt;$translitRules&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'Any-Latin; Latin-ASCII;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="nv"&gt;$nameAscii&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;transliterator_transliterate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$translitRules&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;$name&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="nb"&gt;var_dump&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$nameAscii&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://3v4l.org/2AiNk"&gt;Run this code example on 3v4l.org »&lt;/a&gt; &lt;/p&gt;

&lt;p&gt;Output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;string(17) "Igor' Stravinskij"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;By prepending the &lt;code&gt;Any-Latin&lt;/code&gt; transform rule, the transliterator first converts text from any script into Latin script and then converts the Latin script to ASCII using &lt;code&gt;Latin-ASCII&lt;/code&gt;. Both instructions are separated by a semicolon. That's it! That's our end mark.&lt;/p&gt;

&lt;p&gt;With these few simple lines of PHP code, I have now found a simple yet reliable way to correctly transform any text into ASCII. Without hesitation I wrote a helper function using this code, made sure that all user input in my customer's form was passed through this function and end-to-end tested my form again. And as you might expect: the API call worked again and my customer was happy with my solution. All done!&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Note: The result of the &lt;code&gt;Any-Latin&lt;/code&gt; transform may not exactly be what you would've expected, as that would've been &lt;code&gt;Igor Stravinsky&lt;/code&gt;. This can be explained by the fact that transliteration between scripts isn't an exact science: "there are multiple incompatible standards and in reality transliteration is often carried out without any uniform standards" (&lt;a href="https://en.wikipedia.org/wiki/Romanization_of_Russian#Systematic_transliterations_of_Cyrillic_to_Latin"&gt;Wikipedia&lt;/a&gt;). For example: on &lt;a href="https://it.wikipedia.org/wiki/Igor%27_F%C3%ABdorovi%C4%8D_Stravinskij"&gt;the Italian Wikipedia page for Igor Stravinsky&lt;/a&gt; his name is written exactly like the output above whereas "Igor Stravinsky" is written on the English page.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Bonus tip: a transliteration-powered slugify function
&lt;/h2&gt;

&lt;p&gt;So far I have used two relatively simple transliterator instructions: &lt;code&gt;Any-Latin&lt;/code&gt; and &lt;code&gt;Latin-ASCII&lt;/code&gt;. The ICU Transliterator is far more powerful, however.&lt;/p&gt;

&lt;p&gt;I'll leave you with a final bonus tip: here's a slugify function that uses the ICU Transliterator to create a slug (an SEO-friendly human-readable url part) from any arbitrary string:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="cp"&gt;&amp;lt;?php&lt;/span&gt;
&lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;slugify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="nv"&gt;$input&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nv"&gt;$translitRules&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="s2"&gt;":: Any-Latin"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s2"&gt;":: [:Nonspacing Mark:] Remove"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s2"&gt;":: [:Punctuation:] Remove"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s2"&gt;":: [:Symbol:] Remove"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s2"&gt;":: Latin-ASCII"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s2"&gt;":: Lower()"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s2"&gt;"' ' {' '} &amp;gt; "&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s2"&gt;"::NULL"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s2"&gt;"[:Separator:] &amp;gt; '-'"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;];&lt;/span&gt;
    &lt;span class="nv"&gt;$transliterator&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;  &lt;span class="nc"&gt;\Transliterator&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;createFromRules&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="nb"&gt;implode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;';'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;$translitRules&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nv"&gt;$transliterator&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;transliterate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$input&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nv"&gt;$title&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'&amp;lt;?php François😎: _+ / Стравинский`😜.'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="nv"&gt;$slug&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;slugify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$title&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="nb"&gt;var_dump&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$slug&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://3v4l.org/Hr0iJ"&gt;Run this code example on 3v4l.org »&lt;/a&gt; &lt;/p&gt;

&lt;p&gt;Output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;string(24) "php-francois-stravinskij"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I won't get into details as to how this works because this article is long enough as it is. At this point I encourage you to read more about the ICU Transliterator and experiment with it yourself!&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;What can we conclude from this? I think the ICU Transliterator proves to be a valuable tool not only to convert text from one script to another but also to convert strings between character sets. Its output is more reliable than that of &lt;code&gt;iconv&lt;/code&gt; and even far more extensive conversions are possible.&lt;/p&gt;

&lt;p&gt;Do you have any questions, comments or tips following this article? Feel free to leave a comment below, or reach out to me &lt;a href="https://twitter.com/bartvanraaij"&gt;on Twitter&lt;/a&gt;!&lt;/p&gt;

&lt;p&gt;Thank you for reading my first-ever technical blog post. 😇&lt;/p&gt;

&lt;h2&gt;
  
  
  Further reading and interesting links
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://research.google/pubs/pub36450/"&gt;"Proper Name Transliteration with ICU Transforms"&lt;/a&gt; — A research study by Sascha Brawer Martin Jansche Hiroshi Takenaka Yui Terashima (Google), presented at the 34th Internationalization &amp;amp; Unicode Conference in 2010;&lt;/li&gt;
&lt;li&gt;
&lt;a href="http://www.open-std.org/jtc1/sc22/wg20/docs/n915-transliteration-icu.pdf"&gt;"Transliteration in ICU"&lt;/a&gt; — Slides and transcript of a presentation by Mark Davis and Alan Liu at the 19th International Unicode Conference in 2001;&lt;/li&gt;
&lt;li&gt;The official &lt;a href="https://unicode-org.github.io/icu/"&gt;ICU Documentation&lt;/a&gt;, &lt;a href="http://userguide.icu-project.org/transforms/general"&gt;the old ICU documentation&lt;/a&gt; and &lt;a href="https://github.com/unicode-org/icu"&gt;ICU on GitHub&lt;/a&gt;;&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.php.net/manual/en/class.transliterator.php"&gt;PHP Transliterator&lt;/a&gt; in the PHP documentation;&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.kalzumeus.com/2010/06/17/falsehoods-programmers-believe-about-names/"&gt;"Falsehoods Programmers Believe About Names"&lt;/a&gt; — a must-read article by Patrick McKenzie;&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://open.spotify.com/track/3ZHZmrK9ZD9WAfBcgjz2Gs?si=v-POCuUCRxSEHo3cXUg8vA"&gt;Listen to The Final Hymn&lt;/a&gt; of Igor Stravinsky's "The Firebird" suite on Spotify, performed by the Dutch Royal Concertgebouw Orchestra.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>php</category>
      <category>internationalization</category>
      <category>unicode</category>
      <category>transliteration</category>
    </item>
  </channel>
</rss>
