Strings
A string is a sequence of one or more characters, and one of the most frequently used types in programming. It is therefore fitting that we acquaint ourselves with the idea of operating on strings.
String and character literals
You might be familiar by now with string and character literals from the introductory chapter, which introduced some literals, or from other programming languages. A string literal is surrounded by double quotes: " string "
. Within the string, you can escape a double-quote using a backslash:
Strings are immutable and indexable – indices return the characters at the index position, starting from 1.
The difference between string and character literals
String and character literals are differentiated by two indicia:
strings may have a length other than one while a
Char
type object necessarily has the length one (or potentially zero),strings are introduced and terminated by double quotation marks
""
,Char
type objects are introduced by single apostrophes''
.
The second of these tends to be somewhat vexing for many programmers who are used to the equivalence of ''
and ""
in languages that do not necessarily have an implemented type or class for characters mirroring Char
. So while for instance in Python, 'a' == "a"
holds, this is not the case in Julia:
Heredocs and multiline literals
Multiline literals allow you to keep longer spans of text within a single string, with line breaks. They are introduced, similarly to Python, by triple double quotation marks """
:
As you can see, the use of the """
or 'heredoc' format has preserved the line breaks and structure of the text, a rather helpful feature where longer texts are concerned.
Regex literals
Regular expressions (regexes) are special strings that represent particular patterns. They are useful in matching and searching text, and a good knowledge of regex should be essential knowledge for any good functional programmer.
To construct a regex literal, preface the string with r
:
This is a regex literal that matches (English) vowels. Julia recognises regex literals as the type regex
:
String operations
Substrings
Because strings are indexable, we can use ranges to select a part of a string, something we generally refer to as a substring or string subsetting:
You might recall that a range might actually have a step
attribute, which we can use to obtain every n_th letter within a text. Let's see every odd-numbered letter within the first few words of the Declaration of Independence:
You might remember that end
, which we used above to extend the range across the entire length of the string, behaves like a number. Therefore, you can use it to create a substring that excludes the last, say, five letters:
Concatenation, splitting and interpolation
Concatenating and repeating
In most programming languages, maths and string operations correspond, so you can use +
to concatenate and *
to repeat a string. This is not the case in Julia. +
has no method for String
s. What you would expect +
to do is accomplished by *
:
So how do you multiply a sequence of text? Easy – use the ^
operator. This is useful if you happen to have been set the old school punishment of 'lines' (writing the same sentence all over again).
split()
split()
The split()
function separates a piece of text at a particular character, which it also removes. The result is an array of the chunks. By default, split() will separate at spaces, but you can provide any other string – not even necessarily a single character, as the third example shows:
If you provide ""
as the string to split at, Julia will split the text into individual letters.
You may also use a regex to split your text at:
Needless to say, since strings are immutable, the original string is not affected by the application of split()
.
Interpolation
String interpolation refers to the incredibly useful capability of including variable values within a string. As you might remember, we have used *
above to concatenate strings:
While this is technically correct, it is much faster by using string interpolation, in which case we would refer back to the variable love
as $(love)
within the string. Julia knows this means it is to replace $(love)
with the contents of the variable love
:
You can put anything within the parentheses in string interpolation – anything Julia knows how to handle. For instance, including an expression in a string, you get
If, and only if, you are referring to a variable, you can omit the parentheses (but not if you are referring to an expression):
Regular expressions and finding text within strings
As it has been mentioned, the main utility of regular expressions (Regexes) is to find things within long pieces of text. In the following, we will introduce the three main regex search functions of Julia - match()
, matchall()
and eachmatch()
, with reference to a bit of the Declaration of Independence:
If you are familiar with regular expressions, plod ahead! However, if
looks like gobbledygook to you or you feel your regex fu is a little rusty, put down this book and consult the Regex cheatsheet or, even better, Jeffrey Friedl's amazing book on mastering regexes.
Finding Substring
If you are only concerned with finding a single instance of a search term within a string, the findfirst()
function returns the range index of where the search expression appears:
findfirst()
also accepts regular expressions:
To retrieve the result, rather than its index, you can pass the resulting index off to the string as the subsetting range, using the square bracket []
syntax:
Ah, so that's the word it found!
Where a search string is not found, findfirst()
will yield nothing
.
Finding using the match()
family of functions
match()
family of functionsThe problem with findfirst()
is that it retrieves one, and only one, result – the first within the string passed to it. The match()
family of functions can help us with finding more results:
match()
retrieves either the first match or nothing within the text,matchall()
returns an array of all matching substrings, andeachmatch()
returns an iterator over all matches.
The match()
family of functions needs a regular expression literal as a search argument. This is so even if the regular expression does not make use of any pattern matching beyond a simple string. Thus,
is valid, while
yields an error:
Understanding RegexMatch
objects
RegexMatch
objectsMost regex search functions return an object of type RegexMatch
. As the name reveals, a RegexMatch
is a composite type representing a match. As such, it encapsulates (to use a little more OOP terminology than one would normally be allowed to in a book on functional programming) four values, the first three of which will be of immediate interest to us:
RegexMatch.match
is the matched substring,RegexMatch.captures
is an array of types that represent the type of what the regex would capture,RegexMatch.offset
is generally anInt64
that represents the index of the first character of the matched string where there is a single match (e.g. when usingmatch()
).
To illustrate, let's consider the result of a match()
call, which will be introduced in the next subsection:
match()
match()
match()
retrieves the first match or nothing - in this sense, it is rather similar to findfirst()
:
The result is a RegexMatch
object. The object can be inspected using .match
(e.g. match(r"truths", declaration).match
).
eachmatch()
eachmatch()
eachmatch()
returns an object known as an iterator, specifically of the type RegexMatchIterator
. We have on and off encountered iterators, but we will not really deal with them in depth until chapter [X], which deals with control flow. Suffice it to say an iterator is an object that contains a list of items that can be iterated through. The iterator will iterate over a list of RegexMatch
objects, so if we want the results themselves, we will need to call the .match
method on each of them:
gives following result:
occursin()
occursin()
occursin()
returns a boolean value depending on whether the search text contains a match for the regex provided.
Replacing substrings
Julia can replace substrings using the replace()
syntax... let's try putting some sausages into the Declaration of Independence!
By increasing the value of count you can replace remaining "truth".
Much more dignified than self-evident sausages, I'd say! At risk of repeating myself, it is important to note that since strings are immutable, replace()
merely returns a copy of the string with the search string replaced by the replacement string or the result of the replacement function, and the original string itself will remain unaffected.
Where the substring is not found, the result will be, unsurprisingly, an unaltered string.
Regex flags
A little-known feature of Julia regexes is the ability for a regex to be appended one or more flags. These, like most of Julia's regex capability, derive from Perl's regex module perlre
.
Flag
Function
i
Case-insensitive pattern matching
m
Treats string as a multiline string, so that ^
and $
will refer to the start or end of any line within the string.
s
Treats line as a single line. This will result in .
accepting a newline as well. When used together with m
, it will result in .
matching every possible character while still allowing ^
and $
to match, just after and just before newlines within the string.
x
Ignore non-backslashed, non-classed whitespace.
Flags are appended to the end of each regex, which might strike users more familiar with e.g. the Pythonic way of modifying the regex search object itself, as somewhat unusual:
In this case, the regex r"^We"
was augmented by the multiline flag, appended at its end.
String transformation and testing
Case transformations
Case transformations are functions that act on String
s and transform character case. Let's examine the effect of these transformations in turn.
Function
Effect
Result
uppercase()
Converts the entire string to upper-case characters
WE HOLD THESE TRUTHS TO BE SELF-EVIDENT
lowercase()
Converts the entire string to lower-case characters
we hold these truths to be self-evident
uppercasefirst()
Converts the first character of the string to upper-case
We hold these truths to be self-evident
lowercasefirst()
Converts the first character of the string ot lower-case
we hold these truths to be self-evident
Testing and attributes
Last updated
Was this helpful?