Guide
Overview
Installation
nkf
NKFtool requires nkf to be installed in your system. Major Unix-like operating systems offer its precompiled version in their package management systems.
For example, nkf can be installed with apt-get command for Ubuntu OS:
sudo apt-get install nkf`
or, with Homebrew for MacOSX:
brew install nkf
NKFtool
NKFtool also requires Julia v1.0 or above.
To install NKFtool using Julia's packaging system, enter Julia's package manager prompt with ]
, and run
(v1.1) pkg> add NKFtool
Convert a string
The nkf
command can guess the encoding of Japanese texts.
To guess the encoding of a string from
, use nkf_guess(from::String)
julia> nkf_guess(raw"こんにちわ")
"UTF-8"
julia> nkf_convert( raw"こんにちわ", "-j") |> nkf_guess
"ISO-2022-JP"
julia> nkf_convert( raw"こんにちわ", "-e") |> nkf_guess
"EUC-JP"
julia> nkf_convert( raw"こんにちわ", "-s") |> nkf_guess
"Shift_JIS"
To convert the encoding of a string from
, use nkf_convert(from::String, options="-w -m0")
.
The second parameter options
is passed to the nkf
command to specify its action.
The default of options
is -w -m0
(output encoding is UTF-8, no MIME decode), which keeps the encoding of Julia's standard strings, i.e., UTF-8.
julia> nkf_convert(raw"こんにちわ")
"こんにちわ"
julia> nkf_convert(raw"こんにちわ", "-w -m0")
"こんにちわ"
To convert the encoding, the output encoding option, only one of -j
(ISO-2022-JP), -s
(Shift_JIS), -e
(EUC-JP) and -w
(UTF-8), should be specified. The input encoding option, only one of -J
, -S
, -E
and -W
, may be specified if you know the encoding of the input string from
.
nkf_convert
function returns the output text string. Because in Julia strings in the encoding other than UTF-8 are not printable , it is a good practice to encode them to printable characters with e.g. Base64.base64encode()
, as follows:
julia> using Base64
julia> nkf_convert( raw"こんにちわ", "-j") |> base64encode
"GyRCJDMkcyRLJEEkbxsoQg=="
julia> String(base64decode(ans)) |> nkf_convert
"こんにちわ"
Convert a text stream
nkf_guess
function accepts input text stream for the first argument.
nkf_convert
function also accepts input text stream for the first argument.
See the following code using nkf_guess(from::IO)
to guess the encoding of a text file.
julia> open("hello_sjis.txt","w") do f
print(f, nkf_convert(raw"こんにちわ", "-s"))
end
#
encoding=open("hello_sjis.txt") do f
nkf_guess(f) # <==
end
"Shift_JIS"
The following code using nkf_convert(from::IO, options="-w -m0")
is to convert a text file from Shift_JIS encoding to UTF-8 encoding and to read it as Julia's string.
julia> hello_utf=open("hello_sjis.txt") do f
nkf_convert(f, "-w -m0") # <==
end
"こんにちわ"