In-depth understanding of Java common classes ----String (II)
上篇介绍了String类的构造器,获取内部属性等方法,最后留下了最常用的局部操作函数没有介绍,本篇将接着上篇内容,从这些最常见的函数的操作说起,看看我们日常经常使用的这些方法的内部 be怎么实现的。第一个函数:
public boolean startsWith(String prefix, int toffset) { char ta[] = value; int to = toffset; char pa[] = prefix.value; int po = 0; int pc = prefix.value.length; // Note: toffset might be near -1>>>1. if ((toffset < 0) || (toffset > value.length - pc)) { return false; } while (--pc >= 0) { if (ta[to++] != pa[po++]) { return false; } } return true; }
该方法用于判断 be否当前的字符串对象 be以指定的子串开头。prefix参数指定了这个字串,toffset参数指定了要从原字符串的哪里开始查找。先看个例子:
public static void main(String[] args){ String str = "hello-walker"; System.out.println(str.startsWith("wa", 0)); System.out.println(str.startsWith("wa",6)); }
The results are as follows.
源代码相对而言也 be比较容易理解的,首先 be做了个简单的判断,如果toffset小于0或者toffset和prefix的长度超过了原字符串的长度,直接返回false。接着通过了一个while循环从原字符串的toffset位置和prefix的0位置开始,一个字符一个字符的比较,一旦发现有两者在某个位置的字符值 be不等的,返回false,否则在循环结束时返回true。该方法还有一个重载,该重载默认toffset为0,即从原字符串的开头开始搜索。
endWith这个方法其实内部调用的还 be上述介绍的startWith方法。
public boolean endsWith(String suffix) { return startsWith(suffix, value.length - suffix.value.length); }
我们看到该方法内部调用的startsWith方法,第二个参数传入的 bevalue.length - suffix.value.length,该参数将会导致程序跳过前面一部分的字符,直接跳到还剩下suffix.value.length的字符的位置处。
Here we look at the implementation of hashCode in the String class.
public int hashCode() { int h = hash; if (h == 0 && value.length > 0) { char val[] = value; for (int i = 0; i < value.length; i++) { h = 31 * h + val[i]; } hash = h; } return h; }
While I know it was implemented that way, I don't know why it was done. Just know that it multiplies by 31 each time and then adds the Unicode number of the current character. See an important approach below.
public int indexOf(int ch, int fromIndex) { final int max = value.length; if (fromIndex < 0) { fromIndex = 0; } else if (fromIndex >= max) { return -1; } if (ch < Character.MIN_SUPPLEMENTARY_CODE_POINT) { final char[] value = this.value; for (int i = fromIndex; i < max; i++) { if (value[i] == ch) { return i; } } return -1; } else { return indexOfSupplementary(ch, fromIndex); } }
indexOf方法用于返回某个字符首次出现的位置,当然对应的还有lastIndexOf,我们一点点看。上述的方法,两个参数,第一个参数的值表示需要查找的指定字符(我们知道字符和int型 be可以无条件互转的,所以这里用int接收),后面的代码主要分为两部分,一部分 be大部分情况,另一部分则 be专门用于处理增补字集情况,该情况我们暂时不去研究。第一部分的代码就比较简单了,遍历整个字符串对象,如果找到指定字符,则返回当前位置,否则返回-1。当然该方法也有一些重载,但本质都 be调用了上述介绍的方法。
lastIndexOf方法类似,只不过他 be从后往前查找, this place不再赘述。
下面看一个截取子串的方法:
public String substring(int beginIndex) { if (beginIndex < 0) { throw new StringIndexOutOfBoundsException(beginIndex); } int subLen = value.length - beginIndex; if (subLen < 0) { throw new StringIndexOutOfBoundsException(subLen); } return (beginIndex == 0) ? this : new String(value, beginIndex, subLen); }
The first two judgments of the method are mainly used to handle some extreme cases, and the last statement is the heart of the method. If beginIndex is 0, then the current string object is returned directly, otherwise a new string object is constructed. The method is naturally overloaded, of course
public String substring(int beginIndex, int endIndex) { if (beginIndex < 0) { throw new StringIndexOutOfBoundsException(beginIndex); } if (endIndex > value.length) { throw new StringIndexOutOfBoundsException(endIndex); } int subLen = endIndex - beginIndex; if (subLen < 0) { throw new StringIndexOutOfBoundsException(subLen); } return ((beginIndex == 0) && (endIndex == value.length)) ? this : new String(value, beginIndex, subLen); }
从该重载的两个参数可以看出来,之前只提供一个beginIndex则默认从开始索引处全部截取余下字符。而 this place指定endIndex则选择性的截取从beginIndex到endIndex之间的子串作为结果返回。具体的实现也 be类似,只 be多了一些判断。
The method described below concatenates two different strings.
public String concat(String str) { int otherLen = str.length(); if (otherLen == 0) { return this; } int len = value.length; char buf[] = Arrays.copyOf(value, len + otherLen); str.getChars(buf, len); return new String(buf, true); }
This method has a parameter whose value is a string object to be concatenated after the current string object. The first three lines are very simple, is to determine whether the connection string str is empty, if so, then directly return the current string object, we see a lot of method source code will be the core method at the end, in front of a bunch of judgment, which is also a reflection of efficiency, that is, if the method does not meet the conditions of the call is directly in front of the passed, and do not have to call the complex time-consuming core method. The Arrays.copyOf method is used to create a larger array that can hold the two strings above, and then copy the original string into it, leaving the position left after str empty. Then the getChars method is called to copy the characters in str to buf starting at the index position at offset len, and finally a string object is constructed and returned.
下面看一个更为实用的方法:
public String replace(char oldChar, char newChar) { if (oldChar != newChar) { int len = value.length; int i = -1; char[] val = value; /* avoid getfield opcode */ while (++i < len) { if (val[i] == oldChar) { break; } } if (i < len) { char buf[] = new char[len]; for (int j = 0; j < i; j++) { buf[j] = val[j]; } while (i < len) { char c = val[i]; buf[i] = (c == oldChar) ? newChar : c; i++; } return new String(buf, true); } } return this; }
This method is used to replace a character specified in a string object, but of course it replaces all oldchar. The method first determines whether oldchar (the character to be replaced) is equal to newchar (the character to replace it), if so, it does not need to do any operation, and directly returns the current string object, otherwise, the while loop finds the first oldchar, and then reconstructs an array of char, which is the same length as the value array, then copies all the characters before the first oldchar position to the new array, and then the while loop iterates through the value array to find the oldchar and replace it with newchar, while adding newchar to the new array, and finally returns the String object constructed by the new array.
上述的该方法只能替换指定的一个字符,但 be不能替换某个子串。下面的几个方法都 be用于替换某个子串。
@1 Replace the first substring public String replaceFirst(String regex, String replacement) { return Pattern.compile(regex).matcher(this).replaceFirst(replacement); } @2 Replace every substring that matches the rule public String replaceAll(String regex, String replacement) { return Pattern.compile(regex).matcher(this).replaceAll(replacement); } @3 public String replace(CharSequence target, CharSequence replacement) { return Pattern.compile(target.toString(), Pattern.LITERAL).matcher( this).replaceAll(Matcher.quoteReplacement(replacement.toString())); }
The first method above is relatively well understood, the second and third methods both replace all specified substrings, the difference between them is that the replaceAll method is based on regular expressions, while replace is only for the char string replacement. For example.
public static void main(String[] args){ String str = "aaabssddaa\\"; System.out.println(str); System.out.println(str.replace("\\", "x")); System.out.println(str.replaceAll("\\", "x")); }
输出结果:
我们知道在Java中 表示转义字符,也就 be上述的str中 \ 将被转义成两个 ,而在正则表达式中该符号也 be转义字符,所以我们 replaceAll 方法中的第一个参数的实际值为:,被转义了两次,所以针对str中的 的替换,replaceAll 输出两个x,而在replace方法中,四个被Java转义了一次为两个,所以replace输出一个x。它两区别就 be一个 be基于正则表达式的,一个则只针对char子串。
下面看一个分割字符串的函数split,由于代码比较多, this place就不贴出来了,我大致介绍下实现原理。该方法的参数依然 be依赖正则表达式的,其内部定义了一个ArrayList,定义一个用于匹配字符串的Matcher对象,然后while循环去find原字符串对象,如果找到则直接subSequence前面的所有字符集合,并添加到ArrayList中,然后起始位置从0跳到当前位置之后继续搜索,最后ArrayList对象的toArray方法,返回String类型数组。
下面看一个join方法:
public static String join(CharSequence delimiter, CharSequence... elements) { Objects.requireNonNull(delimiter); Objects.requireNonNull(elements); // Number of elements not likely worth Arrays.stream overhead. StringJoiner joiner = new StringJoiner(delimiter); for (CharSequence cs: elements) { joiner.add(cs); } return joiner.toString(); }
First, the method is a static method. The method then involves a class StringJoiner , which has a constructor.
public StringJoiner(CharSequence delimiter, CharSequence prefix, CharSequence suffix) { Objects.requireNonNull(prefix, "The prefix must not be null"); Objects.requireNonNull(delimiter, "The delimiter must not be null"); Objects.requireNonNull(suffix, "The suffix must not be null"); // make defensive copies of arguments this.prefix = prefix.toString(); this.delimiter = delimiter.toString(); this.suffix = suffix.toString(); this.emptyValue = this.prefix + this.suffix; }
This constructor assigns values to some of the fields of the class, and what they do when you encounter them again will be covered, so just understand their existence here. This calls that constructor and passes in the delimiter splitter, and then calls the add method of the class object
public StringJoiner add(CharSequence newElement) { prepareBuilder().append(newElement); return this; } private StringBuilder prepareBuilder() { // this placevalue for aStringBilder an actual example, beStringJoiner的一个成员 if (value != null) { value.append(delimiter); } else { value = new StringBuilder().append(prefix); } return value; }
第一次add会走else部分,新建一个StringBuilder对象并添加prefix元素( this place在调用构造器的时候为其赋值为空)赋值给我们的成员变量,回到add方法添加该元素到StringBuilder中,第二次到prepareBuilder方法中只会向StringBuilder an actual example中添加delimiter分割符,然后出来add方法中又将第二个元素添加到其中。这样就完成了为这些元素连接一个分隔符,并放入到StringBuilder an actual example中,最后tostring返回。看个例子:
public static void main(String[] args){ String[] strs = new String[]{"hello","walker","yam","cyy","huaaa"}; System.out.println(String.join("-",strs)); }
输出结果:hello-walker-yam-cyy-huaaa
最后还有两个方法,比较简单不再赘述其原理实现。
// returns an internal array of characters, the reason why value is not returned directly is for tightness of encapsulation public char[] toCharArray() { char result[] = new char[value.length]; System.arraycopy(value, 0, result, 0, value.length); return result; } // Remove spaces at the end of the header public String trim() { int len = value.length; int st = 0; char[] val = value; /* avoid getfield opcode */ while ((st < len) && (val[st] <= ' ')) { st++; } while ((st < len) && (val[len - 1] <= ' ')) { len--; } return ((st > 0) || (len < value.length)) ? substring(st, len) : this; }
至此,有关String源码的阅读大致结束,并没有涉及全部代码,有些源码束作者能力问题,没能完全参透,总结的不好,见谅!