In-depth understanding of Java common classes ----String
The String class encapsulates most of the methods related to string operations, from constructing a string object to various operations on strings are encapsulated in the class, in this article we read the source code of String class to understand the principles behind these string operations. The main points are as follows.
I. Cumbersome constructors Before we learn to manipulate strings, we should understand that there are several ways to construct a string object. Let's look at the first constructor first.
private final char value[]; public String() { this.value = "".value; }
The first private field in the String source code is the value array of characters, which is declared final to indicate that it cannot be changed once initialized. This means that a string object is actually composed of an array of characters, and that array cannot be changed once it is initialized. This also nicely explains one of the properties of String objects: immutability. Once the value is assigned, it cannot be changed. And our first constructor is simple; that constructor assigns the current string object to null (non-null).
The next few constructors are simple and actually operate on the array value, but none of them are direct operations, as it is immutable, so they are generally copied locally to the various operations implemented.
//1 public String(String original) { this.value = original.value; this.hash = original.hash; } //2 public String(char value[]) { this.value = Arrays.copyOf(value, value.length); } //3 public String(char value[], int offset, int count) { if (offset < 0) { throw new StringIndexOutOfBoundsException(offset); } if (count <= 0) { if (count < 0) { throw new StringIndexOutOfBoundsException(count); } if (offset <= value.length) { this.value = "".value; return; } } // Note: offset or count might be near -1>>>1. if (offset > value.length - count) { throw new StringIndexOutOfBoundsException(offset + count); } this.value = Arrays.copyOfRange(value, offset, offset+count); }
Either the first way of passing in a String type, or the second way of passing in a char array directly, is converted to assigning a value to the value array property in the object currently to be created. As for the third method, there is a requirement for the char array to be passed in, it requires a new array of the last count of characters starting from the index position of the array as offset to be passed in as an argument. The method first makes several extreme judgments and adds the corresponding exception throws, and the core method is the Arrays.copyOfRange method, which is the one that actually implements the character array copy.
This method passes three parameters, the form parameter value, the start position index, and the end position index. There are two main things done in this method, first, the length of the new array is obtained from the start and end positions, and second, the local function is called to complete the array copy.
System.arraycopy(original, from, copy, 0,Math.min(original.length - from, newLength));
Although the method is local, we can roughly guess how he implements it, which is just a while or for loop traversing the former to assign the latter. Let's look at an example.
public static void main(String[] args){ char[] chs = new char[]{'w','a','l','k','e','r'}; String s = new String(chs,0,3); System.out.println(s); }
Output result: wal
You can see that this is a form of [ a,b), which means that the index includes the starting position, but not the ending position, so the above example only intercepts the index 0, 1, 2 and does not include 3. This form of interception is also common in other functions of String.
The ways of constructing String objects described above are basically part of manipulating its internal character array to achieve this, while the following constructors achieve the construction of String objects by manipulating byte arrays, but of course these operations will involve encoding issues. Here we look at the first constructor on byte arrays.
public String(byte bytes[], int offset, int length, String charsetName) throws UnsupportedEncodingException { if (charsetName == null) throw new NullPointerException("charsetName"); checkBounds(bytes, offset, length); this.value = StringCoding.decode(charsetName, bytes, offset, length); }
This method first ensures that charsetName is not null, and then calls the checkBounds method to determine whether offset, length is less than 0, and whether offset+length is greater than bytes.length. A core method is then called for parsing byte arrays into char arrays according to the specified encoding, which we can look at.
static char[] decode(String charsetName, byte[] ba, int off, int len) throws UnsupportedEncodingException { StringDecoder sd = deref(decoder); String csn = (charsetName == null) ? "ISO-8859-1" : charsetName; if ((sd == null) || !(csn.equals(sd.requestedCharsetName()) || csn.equals(sd.charsetName()))) { sd = null; try { Charset cs = lookupCharset(csn); if (cs != null) sd = new StringDecoder(cs, csn); } catch (IllegalCharsetNameException x) {} if (sd == null) throw new UnsupportedEncodingException(csn); set(decoder, sd); } return sd.decode(ba, off, len); }
First get a reference to the local decoder class by deref method, then get the specified encoding standard by using trinomial expression, if the encoding standard is not specified then the default is ISO-8859-1, then the judgment immediately after is mainly: if the StringDecoder is not obtained from the local thread related class or does not match with the specified encoding standard, then create a StringDecoder instance object manually. Finally a decode method is called to complete the decoding. We use this method more often than that method to convert a byte array to a char array.
public String(byte bytes[], String charsetName) throws UnsupportedEncodingException { this(bytes, 0, bytes.length, charsetName); }
Just specify a byte array and an encoding standard, and of course the internal call is still the same constructor we described above. Of course, you can also not specify any encoding standard, then the default encoding standard will be used: UTF-8
public String(byte bytes[], int offset, int length) { checkBounds(bytes, offset, length); this.value = StringCoding.decode(bytes, offset, length); }
Of course it could be more concise.
public String(byte bytes[]) { this(bytes, 0, bytes.length); }
But constructors generally used to convert byte numbers into strings still use constructors with two parameters consisting of byte arrays and encoding criteria.
The above is the source code for most of the constructors in the String class, some of the source code and the underlying operating system and other aspects of knowledge associated with the understanding is not deep, forgive me. Let's look at some other related operations regarding the String class.
2、 Common functions for property states Several functions of this classification are still relatively simple, the main ones being the following.
// return the length of the string public int length() { return value.length; } // Determine if the string is empty public boolean isEmpty() { return value.length == 0; } // Get a single character at a specified position in a string public char charAt(int index) { if ((index < 0) || (index >= value.length)) { throw new StringIndexOutOfBoundsException(index); } return value[index]; }
That's all there is to the string property functions, It's relatively simple., Here's a look. Common functions for getting internal values。
Third, to obtain the internal values of common functions There are two main categories of functions under this category, an array of returned characters and an array of returned bytes. Let's first look at the method that returns an array of characters.
public void getChars(int srcBegin, int srcEnd, char dst[], int dstBegin) { if (srcBegin < 0) { throw new StringIndexOutOfBoundsException(srcBegin); } if (srcEnd > value.length) { throw new StringIndexOutOfBoundsException(srcEnd); } if (srcBegin > srcEnd) { throw new StringIndexOutOfBoundsException(srcEnd - srcBegin); } System.arraycopy(value, srcBegin, dst, dstBegin, srcEnd - srcBegin); }
This function is used to copy the start index position srcBegin to the end index position srcEnd of the value character array in the current String object into the target array dst, where the dst array starts at the dstBegin index. Look at an example.
public static void main(String[] args){ String str = "hello-walker"; char[] chs = new char[6]; str.getChars(0,5,chs,1); for(int a=0;a<chs.length;a++){ System.out.println(chs[a]); } }
The results are as follows.
We specify an array of five characters from [0, 5) of str, copying them one by one into chs, starting from chs array index 1. That's all there is to know about the function that gets an array of characters, so let's look at the function that gets an array of bytes.
public byte[] getBytes(String charsetName) throws UnsupportedEncodingException { if (charsetName == null) throw new NullPointerException(); return StringCoding.encode(charsetName, value, 0, value.length); }
The core method of this function, StringCoding.encode, is very similar to StringCoding.decode above, except that one provides an encoding standard for decoding into a string object, while the other provides an encoding standard for encoding a string into a byte array. There are a few more overloads about getBytes, but each of these basically calls this method we listed above, except that they omit some parameters (using their default values).
IV. Judgment functions We may often encounter the function equls in our daily projects, so is this function again with the same function as the symbol ==? Here we look at the adjunction function.
public boolean equals(Object anObject) { if (this == anObject) { return true; } if (anObject instanceof String) { String anotherString = (String)anObject; int n = value.length; if (n == anotherString.value.length) { char v1[] = value; char v2[] = anotherString.value; int i = 0; while (n-- != 0) { if (v1[i] != v2[i]) return false; i++; } return true; } } return false; }
We see that the method uses the symbol == in the first judgment, which is actually equal to the symbol to determine whether two objects point to the same memory space address (of course, if they are pointing to the same memory, the values they encapsulate internally are naturally equal). As we can see from the above code, this equals method, first determines whether the two objects point to the same memory location, and returns true if they do, and only if they don't determine whether the array they encapsulate internally is equal.
public boolean equalsIgnoreCase(String anotherString) { return (this == anotherString) ? true : (anotherString != null) && (anotherString.value.length == value.length) && regionMatches(true, 0, anotherString, 0, value.length); }
This method is a case-ignoring equivalence method, and the core method is regionMatches.
public boolean regionMatches(boolean ignoreCase, int toffset, String other, int ooffset, int len) { char ta[] = value; int to = toffset; char pa[] = other.value; int po = ooffset; if ((ooffset < 0) || (toffset < 0) || (toffset > (long)value.length - len) || (ooffset > (long)other.value.length - len)) { return false; } while (len-- > 0) { char c1 = ta[to++]; char c2 = pa[po++]; if (c1 == c2) { continue; } if (ignoreCase) { char u1 = Character.toUpperCase(c1); char u2 = Character.toUpperCase(c2); if (u1 == u2) { continue; } if (Character.toLowerCase(u1) == Character.toLowerCase(u2)) { continue; } } return false; } return true; }
First is the error detection judgment, a simple judgment under the incoming parameters are less than 0, etc., and then by constantly reading the characters of the two character array to compare whether equal, if equal, then directly skip the rest of the code into the next loop, otherwise the two characters are converted to lowercase and uppercase two forms of comparison, if equal, still return true. The equals method can only determine if the two are equal, but it can't do anything about who is bigger and who is smaller. Here we look at the compare related method, which tables the size of both.
public int compareTo(String anotherString) { int len1 = value.length; int len2 = anotherString.value.length; int lim = Math.min(len1, len2); char v1[] = value; char v2[] = anotherString.value; int k = 0; while (k < lim) { char c1 = v1[k]; char c2 = v2[k]; if (c1 != c2) { return c1 - c2; } k++; } return len1 - len2; }
This method will determine the size of both, based on the dictionary order, and the code is relatively simple and will not be repeated. A similar dictionary-order ranking that ignores case involves mainly the following methods.
public int compareToIgnoreCase(String str) { return CASE_INSENSITIVE_ORDER.compare(this, str); }
The compare method here is an internal class of the CASE_INSENSITIVE_ORDER class.
In order not to make the article too long, this piece is temporarily ended, the next will introduce some of the most common functions related to string manipulation source code, the summary of bad, hope to congratulate!