I'm trying to split Arabic text into individual words. Here's sample code:
var str = "المادة 1 يولد جميع الناس أحرارًا متساوين في الكرامة والحقوق. وقد وهبوا عقلاً وضميرًا وعليهم أن يعامل بعضهم بعضًا بروح الإخاء.";
var strWithHashtag = "المادة 1 يولد جميع الناس أحرارًا متساوين في الكرامة والحقوق. وقد وهبوا عقلاً وضميرًا وعليهم أن #يعامل بعضهم بعضًا بروح الإخاء.";
var substrings = strWithHashtag.Split(' ');
The text is copied from https://r12a.github.io/scripts/arabic/, and it's the first paragraph under sample (arabic). I have two questions:
- Why is the period sign placed at the end of
streven though it appears as the first character on the web page? - When I split the string into individual words,
يعامل#becomes#يعامل. How can I keep the original position of the#sign? Eventually, I need to extract hashtags from RTL languages, and so I need#to appear as the first character of the RTL hashtag.