1

This sounds like a simple question but I cannot solve it somehow. I want to print, preferably with the knitr::kable() function, a data frame that contains a string with a 'greater than or equal sign' (or the opposite one) in R, but the sign is converted to an 'equal (=)' sign when printed. I will show you the problem first, and then what I already did to try to find the answer.

library(knitr)
minimal.example <- data.frame(x= "≥10",y="≤20")
# note: same results with data.frame(x="\U2265 10", y="\U2264 20")
knitr::kable(minimal.example)

output:

x y
=10 =20

Expected output:

x y
≥10 ≤20

I know from the answers here and here that this problem occurs in plain R as well as in Rstudio, and only in R installed under Windows; thus it is not reproducible on MAC or Linux operating systems. A suggestion made here using the expression() function does not work in my case, probably because of my windows machine? The problem also occurs with the base R print function print(minimal.example) so it is not restricted to the kable() function. I've updated my R version to the latest one but have still the same result. I've also tried a different locale (Dutch_Netherlands.1252) and someone else tried a US locale without effect.

Two questions:

  1. Can someone explain what is going on? (my guess is that it happens in the base R data.frame function?)
  2. How can I solve this problem to get the desired result? I need to be able to convert it to both latex and html within an Rmarkdown document (usually no problem with the kable function).

Any help is greatly appreciated!

Session info:

R version 4.0.3 (2020-10-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)

Matrix products: default

locale:
[1] LC_COLLATE=English_United Kingdom.1252  LC_CTYPE=English_United Kingdom.1252    LC_MONETARY=English_United Kingdom.1252
[4] LC_NUMERIC=C                            LC_TIME=English_United Kingdom.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] knitr_1.30

loaded via a namespace (and not attached):
[1] compiler_4.0.3 tools_4.0.3    highr_0.8      xfun_0.19    
Leon Samson
  • 405
  • 3
  • 12
  • 1
    Windows-1252 (your locale encoding) do not have the unequality symbols. This may be the reason that in other OS you do not see the problems (other OS uses UTF-8 as default). So check on how to set terminal and OUTPUT of R in UTF-8 (and also in a manner that output could also be shown – Giacomo Catenazzi Jan 22 '21 at 12:13
  • Thanks for the reply. I searched for changing the locale to UTF-8; maybe it's just not possible under Windows? see [here](https://stackoverflow.com/a/46734577/11856430) for instance. Default text encoding within Rstudio is set to UTF-8 but the problem is still there. – Leon Samson Jan 22 '21 at 12:38
  • Your code work with `locales <- c("LC_COLLATE","LC_CTYPE","LC_MONETARY","LC_NUMERIC","LC_TIME"); for (x in locales) { Sys.setlocale(category = x, locale="English_United Kingdom.437")}` (I know, `CP437` might fail with more complex data). – JosefZ Jan 22 '21 at 14:03
  • CP437 is worst then 1252. Windows is capable to use UTF-8, and many terminals can do it. I think R FAW has something about it (on how to setup). You should put it probably before to start R (so in terminal, or in a .bat file which will start R – Giacomo Catenazzi Jan 22 '21 at 14:24
  • Apparently, as of July 2020, UTF-8 support for R on Windows was still experimental: [Windows/UTF-8 Build of R and CRAN Packages](https://developer.r-project.org/Blog/public/2020/07/30/windows/utf-8-build-of-r-and-cran-packages/index.html) – Where's my towel Jan 22 '21 at 14:45
  • Weird: providing that `me <- data.frame(x="\U2265 10", y="\U2264 20")` then `me` shows _Equals Signs_ however `paste(me[1,'x'],me[1,'y'], sep = " ; ")` shows `[1] "≥ 10 ; ≤ 20"` so the problem is rather in `data.frame` function itself… – JosefZ Feb 25 '21 at 21:45

1 Answers1

0

So, based on the comments on my question above: I think until UTF-8 support in R is supported under windows (see here), this problem will still occur with no general solutions and we have to work around this problem.

Workaround 1, as proposed by @JosefZ: You could try to use CP437 encoding which has the special signs that I need, using the Sys.setlocale function (English_United Kingdom.437). Downside is: it is more restricted for other special characters, so it will not work in all cases. See here which characters are supported

Workaround 2: Use regex to replace the symbols with latex equivalent seems to work in specific cases and specifically for knitr::kable() tables. It is a bit long though, and the characters should be enclosed with dollar signs as well, e.g. $\\\\leq$. Also, tibbles work better in than normal base R dataframes. Note: this solution does not work when using tibble(x="≥10",y="≤20"), only when using tibble(x="\U2265 10", y="\U2264 20"). It seems to work when you read a table with any of the readr or read_excel functions, which is how I need to use it.

library(knitr)
library(dplyr)
library(stringr)

minimal.example <- dplyr::tibble(x="\U2265 10", y="\U2264 20")
adjusted.data <- minimal.example %>% 
  mutate(across(everything(), ~str_replace_all(., "\U2264", "\\\\leq")
                )) %>% 
  mutate(across(everything(), ~str_replace_all(., "\U2265", "\\\\geq")
  )) %>% 
  mutate(across(everything(), 
                ~ifelse(
                  str_detect(., "\\\\leq|\\\\geq"),
                  trimws(paste0("$", ., "$")),
                  .
                  )
                )
         )

knitr::kable(adjusted.data)

This gives the html table below as output, and renders properly in a (R)markdown environment:

<table>
 <thead>
  <tr>
   <th style="text-align:left;"> x </th>
   <th style="text-align:left;"> y </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> $\geq 10$ </td>
   <td style="text-align:left;"> $\leq 20$ </td>
  </tr>
</tbody>
</table>

Any improvements or better answers are greatly appreciated.

Leon Samson
  • 405
  • 3
  • 12