My Nginx webserver does not correctly serve content with accented characters in files names

Question

I just created a new web server using a Linux box and NginX. It seems to be working correctly for most content.

I just converted a site which used to be on a Windows Server box and for the most part the content and functionality (HTML and JavaScript) transferred ok. However, I have discovered that NginX seems to have problems with image names (perhaps other files also??) which contain accented characters, for example é,è,ô, etc.

If there were only a hand-full I could just manually rename the files, but there are hundreds (maybe thousands) which make a manual process unworkable. Can someone offer a way to easily rename files with these accented characters?

Thanks..RDK

edit: The original conversion of the old site had lots of pages using "windows-1252" and a few using "UTF-8" for the "charset". The later had issues displaying page content for special characters usually as the "black diamond ?" symbol. The other seemed to have issues with JavaScript and also some display issues. After a web search on that issue I change all "charset=" values to "iso-8859-1" which is the default for many browsers which corrected all of those issues. But now I have the special characters in file names problem...

hei, could be a charset issue? Best Practice would be, not using special characters in file-names and no spaces — djdomi, Feb 19 '23 at 15:59

Gilles Quénot · Answer 1 · 2023-02-20T20:36:45.290

One way to remove diacritics with a Perl's rename:

if you need to rename diacritics to ascii equivalent:

rename -u utf8 '
    BEGIN{use Text::Undiacritic qw(undiacritic)}
    s/.*/undiacritic($&)/e
' éééé.txt 
rename(éééé.txt, eeee.txt)

One other way is to use the detox utility. Available with Debian/Ubuntu and other distros as a package.

Another last way is to use this script, based on convmv(1) translated in English from a French project : forum.ubuntu-fr.org:

It's intended to change wrong charset to utf8. (Not a script of mine, but Lapogne71), could be an issue solver.

#!/bin/bash
VERSION="v0.04"
#---------------------------------------------------------------------------------------
This script allows to loop the "convmv" utility that allows converting file names coded in
something other than UTF-8 to UTF-8
Restart the script with the ALLCODES argument if no result has been found
#---------------------------------------------------------------------------------------
here are the colors of the text displayed in the shell
RED="\033[1;31m"
NORMAL="\033[0;39m"
BLUE="\033[1;36m"
GREEN="\033[1;32m"
echo
echo -e "$GREEN $0 $NORMAL  $VERSION"
echo
echo "----------------------------------------------------------
This script allows to loop the 'convmv' utility that allows converting file names coded in
something other than UTF-8 to UTF-8. Restart the script with the ALLCODES argument if no result
has been found.
----------------------------------------------------------"
The main loop launches convmv tests to "visually" detect the original encoding
We only loop over the iso-8859* and cp* code families as they are the most likely ones (EBCDIC codes have also been removed from the list)
CODES_LIST="
iso-8859-1
iso-8859-2
iso-8859-3
iso-8859-4
iso-8859-5
iso-8859-6
iso-8859-7
iso-8859-8
iso-8859-9
iso-8859-10
iso-8859-11
iso-8859-13
iso-8859-14
iso-8859-15
iso-8859-16
cp437
cp737
cp775
cp850
cp852
cp855
cp856
cp857
cp860
cp861
cp862
cp863
cp864
cp865
cp866
cp869
cp874
cp932
cp936
cp949
cp950
cp1250
cp1251
cp1252
cp1253
cp1254
cp1255
cp1256
cp1257
cp1258
"
We check if the convmv utility is installed
path=which convmv 2&gt; /dev/null
if [ -z "$path" ]; then
    echo -e "$RED ERROR: convmv is not installed, please install it by typing:"
    echo
    echo -e "$BLUE    sudo apt-get install convmv "
    echo
    echo -e "$RED ==> program exit"
    echo
    echo -e "$NORMAL"
    exit 1
fi
To loop over all the codepages supported by convmv, the ALLCODES argument must be provided
if [ "$1" = "ALLCODES" ]; then
    CODES_LIST=convmv --list
    echo
    echo -e "$RED Check which original encoding seems correct (press 'y' and validate if waiting for display)$NORMAL"
    echo
fi
Main loop of the program
for CODAGE in $CODES_LIST; do
    echo -e "$BLUE--- Encoding hypothesis: $RED $CODAGE $BLUE---$NORMAL"
    echo
    # echo -e "$RED Press 'y' and validate if no list is displayed $NORMAL"
    convmv -f $CODAGE -t utf-8 -r * 2>&1 | grep -v Perl | grep -v Starting | grep -v notest | grep -v Skipping > /tmp/affichage_convmv.txt
    NOMBRE_FICHIERS=cat /tmp/affichage_convmv.txt | wc -l
    if [ $NOMBRE_FICHIERS -eq 0 ]; then
        echo
        echo -e "$RED No filename to convert " $NORMAL
        echo
        echo -e "$BLUE Exiting program ... $NORMAL"
        echo
        rm /tmp/affichage_convmv.txt 2>/dev/null
        exit 0
    fi
# sed 's ..  ' source.txt   ==&gt; this removes the first 2 characters from a string
echo -e $GREEN &quot;Original filenames coded in $CODAGE: &quot; $NORMAL
# ALTERNATIVE cat /tmp/affichage_convmv.txt | cut -f 2 -d '&quot;' | sed 's ..  '
cat /tmp/affichage_convmv.txt | cut -f 2 -d '&quot;'
echo
echo -e $GREEN &quot;Filenames converted to UTF-8: &quot; $NORMAL
# ALTERNATIVE cat /tmp/affichage_convmv.txt | cut -f 4 -d '&quot;' | sed 's ..  '
cat /tmp/affichage_convmv.txt | cut -f 4 -d '&quot;'
echo

echo -n -e $GREEN &quot;Found encoding? $RED [N]$NORMAL&quot;&quot;on /$RED o$NORMAL&quot;&quot;ui /$RED q$NORMAL&quot;&quot;uit: &quot;
read confirm
echo

# request for file conversion using convmv
if [ &quot;$confirm&quot; = O ] || [ &quot;$confirm&quot; = o ];then
    echo -e &quot;$BLUE Convert filenames now from encoding $CODAGE? $NORMAL&quot;
    echo -e &quot;$BLUE   ==&gt; convmv -f $CODAGE -t utf-8 * --notest $NORMAL&quot;
    echo -n -e $GREEN &quot;Confirm conversion $RED [N]$NORMAL&quot;&quot;on /$RED o$NORMAL&quot;&quot;ui /$RED r$NORMAL&quot;&quot;ecursive: &quot;
    read confirm
    echo

    case $confirm in
        O|o)    convmv -f $CODAGE -t utf-8 * --notest 2&gt;/dev/null
            echo
            echo -e &quot;$BLUE File name conversion done... $NORMAL&quot; ;;
        R|r)    convmv -f $CODAGE -t utf-8 * -r --notest 2&gt;/dev/null
            echo
            echo -e &quot;$BLUE Recursive file name conversion done... $NORMAL&quot; ;;
        *)      echo -e &quot;$BLUE Exiting program... $NORMAL&quot; ;;
    esac

    echo
    rm /tmp/affichage_convmv.txt 2&gt;/dev/null
    exit 0

# request for program exit

elif [ "$confirm" = Q ] || [ "$confirm" = q ];then
    echo -e "$BLUE Exiting program... $NORMAL"
    echo
    rm /tmp/affichage_convmv.txt 2>/dev/null
    exit 0
    fi
    clear
done
rm /tmp/affichage_convmv.txt 2>/dev/null

Hmmm, the original conversion had lots of pages using "windows-1252" and a few using "UTF-8" for the "charset". The later had issues displaying page content for special characters usually as the "black diamond ?" symbol. The other seemed to have issues with JavaScript and also some display issues. After a web search on that issue I change all "charset=" values to "iso-8859-1" which corrected all of those issues. But now I have the special characters in file names problem... I'll update my question with this information. — RDK, Feb 19 '23 at 19:48
The "black diamond ?" you called is accurately Unicode replacement character — n0099, Feb 20 '23 at 06:12

score 1 · Answer 2 · answered Feb 19 '23 at 20:03

You need to open each file with Notepad++, Sublime, VSCode, or some other text editor that supports switching the charset. Switch the charset to UTF-8 and then save the file. If you are editing the files directly in linux, you might consider using iconv to convert each file to UTF-8 encoding.

Then after you've converted all your text based files to UTF-8 charset, test out nginx and the characters should display. If not, you can also try (in nginx.conf or whichever .conf file has your server configuration) add this line:

charset UTF-8;

The files and webserver must be the same charset, so converting everything to UTF-8 is the simplest way to avoid these issues in the future.

My Nginx webserver does not correctly serve content with accented characters in files names

2 Answers2

This script allows to loop the "convmv" utility that allows converting file names coded in

something other than UTF-8 to UTF-8

Restart the script with the ALLCODES argument if no result has been found

here are the colors of the text displayed in the shell

The main loop launches convmv tests to "visually" detect the original encoding

We only loop over the iso-8859* and cp* code families as they are the most likely ones (EBCDIC codes have also been removed from the list)

We check if the convmv utility is installed

To loop over all the codepages supported by convmv, the ALLCODES argument must be provided

Main loop of the program