StringHelper

String handling class for UTF-8 data wrapping the phputf8 library. All functions assume the validity of UTF-8 strings.

abstract
since

1.3.0

package

Joomla Framework

Methods

compliant

Tests whether a string complies as UTF-8.

compliant( str) : 
static

This will be much faster than StringHelper::valid() but will pass five and six octet UTF-8 sequences, which are not supported by Unicode and so cannot be displayed correctly in a browser. In other words it is not as strict as StringHelper::valid() but it's faster. If you use it to validate user input, you place yourself at the risk that attackers will be able to inject 5 and 6 byte sequences (which may or may not be a significant risk, depending on what you are are doing).

see StringHelper::valid
link
since

1.3.0

Arguments

str

stringUTF-8 string to check

Response

boolTRUE if string is valid UTF-8

increment

Increments a trailing number in a string.

increment( string, string|null style = 'default',  n) : 
static

Used to easily create distinct labels when copying objects. The method has the following styles:

default: "Label" becomes "Label (2)" dash: "Label" becomes "Label-2"

since

1.3.0

Arguments

string

stringThe source string.

style

string|nullThe the style (default|dash).

n

intIf supplied, this number is used for the copy, otherwise it is the 'next' number.

Response

stringThe incremented string.

is_ascii

Tests whether a string contains only 7bit ASCII bytes.

is_ascii( str) : 
static

You might use this to conditionally check whether a string needs handling as UTF-8 or not, potentially offering performance benefits by using the native PHP equivalent if it's just ASCII e.g.;

if (StringHelper::is_ascii($someString)) { // It's just ASCII - use the native PHP version $someString = strtolower($someString); } else { $someString = StringHelper::strtolower($someString); }
since

1.3.0

Arguments

str

stringThe string to test.

Response

boolTrue if the string is all ASCII

ltrim

UTF-8 aware replacement for ltrim()

ltrim( str, string|bool charlist = false) : 
static

Strip whitespace (or other characters) from the beginning of a string. You only need to use this if you are supplying the charlist optional arg and it contains UTF-8 characters. Otherwise ltrim will work normally on a UTF-8 string.

link
since

1.3.0

Arguments

str

stringThe string to be trimmed

charlist

string|boolThe optional charlist of additional characters to trim

Response

stringThe trimmed string

ord

UTF-8 aware alternative to ord()

ord( chr) : 
static

Returns the unicode ordinal for a character.

link
since

1.4.0

Arguments

chr

stringUTF-8 encoded character

Response

intUnicode ordinal for the character

rtrim

UTF-8 aware replacement for rtrim()

rtrim( str, string|bool charlist = false) : 
static

Strip whitespace (or other characters) from the end of a string. You only need to use this if you are supplying the charlist optional arg and it contains UTF-8 characters. Otherwise rtrim will work normally on a UTF-8 string.

link
since

1.3.0

Arguments

str

stringThe string to be trimmed

charlist

string|boolThe optional charlist of additional characters to trim

Response

stringThe trimmed string

str_ireplace

UTF-8 aware alternative to str_ireplace()

str_ireplace(string|string[] search, string|string[] replace,  str, int|null|bool count = null) : 
static

Case-insensitive version of str_replace()

link
since

1.3.0

Arguments

search

string|array<string|int, string>String to search

replace

string|array<string|int, string>Existing string to replace

str

stringNew string to replace with

count

int|null|boolOptional count value to be passed by reference

Response

stringUTF-8 String

str_pad

UTF-8 aware alternative to str_pad()

str_pad( input,  length,  padStr = ' ',  type = STR_PAD_RIGHT) : 
static

Pad a string to a certain length with another string. $padStr may contain multi-byte characters.

link
since

1.4.0

Arguments

input

stringThe input string.

length

intIf the value is negative, less than, or equal to the length of the input string, no padding takes place.

padStr

stringThe string may be truncated if the number of padding characters can't be evenly divided by the string's length.

type

intThe type of padding to apply

Response

string

str_split

UTF-8 aware alternative to str_split()

str_split( str,  splitLen = 1) : array|string|bool
static

Convert a string to an array.

link
since

1.3.0

Arguments

str

stringUTF-8 encoded string to process

splitLen

intNumber to characters to split string by

Response

array<string|int, mixed>|string|bool

strcasecmp

UTF-8/LOCALE aware alternative to strcasecmp()

strcasecmp( str1,  str2, string|bool locale = false) : 
static

A case insensitive string comparison.

link
since

1.3.0

Arguments

str1

stringstring 1 to compare

str2

stringstring 2 to compare

locale

string|boolThe locale used by strcoll or false to use classical comparison

Response

intEither < 0 if str1 is less than str2; > 0 if str1 is greater than str2, and 0 if they are equal.

strcmp

UTF-8/LOCALE aware alternative to strcmp()

strcmp( str1,  str2,  locale = false) : 
static

A case sensitive string comparison.

link
since

1.3.0

Arguments

str1

stringstring 1 to compare

str2

stringstring 2 to compare

locale

mixedThe locale used by strcoll or false to use classical comparison

Response

intEither < 0 if str1 is less than str2; > 0 if str1 is greater than str2, and 0 if they are equal.

strcspn

UTF-8 aware alternative to strcspn()

strcspn( str,  mask, int|bool start = null, int|bool length = null) : 
static

Find length of initial segment not matching mask.

link
since

1.3.0

Arguments

str

stringThe string to process

mask

stringThe mask

start

int|boolOptional starting character position (in characters)

length

int|boolOptional length

Response

intThe length of the initial segment of str1 which does not contain any of the characters in str2

stristr

UTF-8 aware alternative to stristr()

stristr( str,  search) : string|bool
static

Returns all of haystack from the first occurrence of needle to the end. Needle and haystack are examined in a case-insensitive manner to find the first occurrence of a string using case insensitive comparison.

link
since

1.3.0

Arguments

str

stringThe haystack

search

stringThe needle

Response

string|bool

strlen

UTF-8 aware alternative to strlen()

strlen( str) : 
static

Returns the number of characters in the string (NOT THE NUMBER OF BYTES).

link
since

1.3.0

Arguments

str

stringUTF-8 string.

Response

intNumber of UTF-8 characters in string.

strpos

UTF-8 aware alternative to strpos()

strpos( str,  search, int|null|bool offset = false) : int|bool
static

Find position of first occurrence of a string.

link
since

1.3.0

Arguments

str

stringString being examined

search

stringString being searched for

offset

int|null|boolOptional, specifies the position from which the search should be performed

Response

int|boolNumber of characters before the first match or FALSE on failure

strrev

UTF-8 aware alternative to strrev()

strrev( str) : 
static

Reverse a string.

link
since

1.3.0

Arguments

str

stringString to be reversed

Response

stringThe string in reverse character order

strrpos

UTF-8 aware alternative to strrpos()

strrpos( str,  search,  offset) : int|bool
static

Finds position of last occurrence of a string.

link
since

1.3.0

Arguments

str

stringString being examined.

search

stringString being searched for.

offset

intOffset from the left of the string.

Response

int|boolNumber of characters before the last match or false on failure

strspn

UTF-8 aware alternative to strspn()

strspn( str,  mask, int start = null, int length = null) : 
static

Find length of initial segment matching mask.

link
since

1.3.0

Arguments

str

stringThe haystack

mask

stringThe mask

start

int|nullStart optional

length

int|nullLength optional

Response

int

strtolower

UTF-8 aware alternative to strtolower()

strtolower( str) : string|bool
static

Make a string lowercase

Note: The concept of a characters "case" only exists is some alphabets such as Latin, Greek, Cyrillic, Armenian and archaic Georgian - it does not exist in the Chinese alphabet, for example. See Unicode Standard Annex #21: Case Mappings

link
since

1.3.0

Arguments

str

stringString being processed

Response

string|boolEither string in lowercase or FALSE is UTF-8 invalid

strtoupper

UTF-8 aware alternative to strtoupper()

strtoupper( str) : string|bool
static

Make a string uppercase

Note: The concept of a characters "case" only exists is some alphabets such as Latin, Greek, Cyrillic, Armenian and archaic Georgian - it does not exist in the Chinese alphabet, for example. See Unicode Standard Annex #21: Case Mappings

link
since

1.3.0

Arguments

str

stringString being processed

Response

string|boolEither string in uppercase or FALSE is UTF-8 invalid

substr

UTF-8 aware alternative to substr()

substr( str,  offset, int|null|bool length = false) : string|bool
static

Return part of a string given character offset (and optionally length).

link
since

1.3.0

Arguments

str

stringString being processed

offset

intNumber of UTF-8 characters offset (from left)

length

int|null|boolOptional length in UTF-8 characters from offset

Response

string|bool

substr_replace

UTF-8 aware alternative to substr_replace()

substr_replace( str,  repl,  start, int|bool|null length = null) : 
static

Replace text within a portion of a string.

link
since

1.3.0

Arguments

str

stringThe haystack

repl

stringThe replacement string

start

intStart

length

int|bool|nullLength (optional)

Response

string

transcode

Transcode a string.

transcode( source,  fromEncoding,  toEncoding) : string|null
static
link
since

1.3.0

Arguments

source

stringThe string to transcode.

fromEncoding

stringThe source encoding.

toEncoding

stringThe target encoding.

Response

string|nullThe transcoded string, or null if the source was not a string.

trim

UTF-8 aware replacement for trim()

trim( str, string|bool charlist = false) : 
static

Strip whitespace (or other characters) from the beginning and end of a string. You only need to use this if you are supplying the charlist optional arg and it contains UTF-8 characters. Otherwise trim will work normally on a UTF-8 string

link
since

1.3.0

Arguments

str

stringThe string to be trimmed

charlist

string|boolThe optional charlist of additional characters to trim

Response

stringThe trimmed string

ucfirst

UTF-8 aware alternative to ucfirst()

ucfirst( str, string|null delimiter = null, string|null newDelimiter = null) : 
static

Make a string's first character uppercase or all words' first character uppercase.

link
since

1.3.0

Arguments

str

stringString to be processed

delimiter

string|nullThe words delimiter (null means do not split the string)

newDelimiter

string|nullThe new words delimiter (null means equal to $delimiter)

Response

stringIf $delimiter is null, return the string with first character as upper case (if applicable) else consider the string of words separated by the delimiter, apply the ucfirst to each words and return the string with the new delimiter

ucwords

UTF-8 aware alternative to ucwords()

ucwords( str) : 
static

Uppercase the first character of each word in a string.

link
since

1.3.0

Arguments

str

stringString to be processed

Response

stringString with first char of each word uppercase

unicode_to_utf16

Converts Unicode sequences to UTF-16 string.

unicode_to_utf16( str) : 
static
since

1.3.0

Arguments

str

stringUnicode string to convert

Response

stringUTF-16 string

unicode_to_utf8

Converts Unicode sequences to UTF-8 string.

unicode_to_utf8( str) : 
static
since

1.3.0

Arguments

str

stringUnicode string to convert

Response

stringUTF-8 string

valid

Tests a string as to whether it's valid UTF-8 and supported by the Unicode standard.

valid( str) : 
static

Note: this function has been modified to simple return true or false.

author

[email protected]

link
see compliant
since

1.3.0

Arguments

str

stringUTF-8 encoded string.

Response

booltrue if valid

Properties

incrementStyles

Increment styles.

static
since

1.3.0

Type(s)

array<string|int, mixed>