[MM’s] Vowel Notes — Engineering Java vowel-checking with micro-benchmarks

Exploring the trade-offs between readability and raw speed for detecting a vowel in a string.

Vowel Notes — Engineering Java vowel-checking with micro-benchmarks

Inspired by Austin Henley’s exploration of vowels — The fastest way to detect a vowel in a string, this post brings the concept to Java. Vowels may be small, but detecting them efficiently can reveal a lot about string processing in Java.

We’ll walk through multiple approaches — from classic loops to modern streams — and measure their performance with micro-benchmarks. Whether you’re optimizing for speed or clarity, there’s something here for every Java developer.

The Classics: Loop-Based Approaches

These methods are fundamental, easy to reason about and a great starting point.

1. Simple Loop with Character Check

This is the most direct and explicit way to solve the problem.

public boolean hasVowels(String text) {
for (var c : text.toCharArray()) {
if (c == 'a' || c == 'e' || c == 'i' || c == 'o' || c == 'u' ||
c == 'A' || c == 'E' || c == 'I' || c == 'O' || c == 'U') {
return true;
}
}
return false;
}
  • How it works: It iterates through each character and uses a series of OR (||) conditions to check against a hard-coded list of vowels.
  • Pros: Extremely easy for beginners to understand.
  • Cons: Verbose and less maintainable.

2. Loop with String.indexOf()

A small refinement that makes our vowel list much easier to manage.

private static final String VOWELS = "aeiouAEIOU";
public boolean hasVowels(String text) {
for (var c : text.toCharArray()) {
if (VOWELS.indexOf(c) != -1) {
return true;
}
}
return false;
}
  • How it works: The vowels are stored in a single String. The indexOf() method quickly checks if a character exists within that string.
  • Pros: Centralizes the vowel definition, making it clean, maintainable and readable.
  • Cons: String.indexOf() isn't the most performant lookup method for a single character.

The Modern Way: Stream-Based Approach

Leveraging Java 8+ features, this approach is more declarative and functional.

3. anyMatch with a Set

This is an elegant and efficient solution that combines the power of Streams with the performance of a Set.

private static final Set<Character> VOWELS = Set.of('a', 'e', 'i', 'o', 'u',
'A', 'E', 'I', 'O', 'U');
public boolean hasVowels(String text) {
return text.chars()
.mapToObj(ch -> (char) ch)
.anyMatch(VOWELS::contains);
}
  • How it works: It converts the string into a stream of characters. anyMatch is a short-circuiting operation that stops as soon as it finds a character contained in the VOWELS set.
  • Pros: Clean, expressive one-liner. A Set provides a very fast O(1) average lookup time.
  • Cons: Can have a minor overhead for creating streams, which might be noticeable on extremely small strings in a tight loop.

For Maximum Performance: Low-Level Techniques

When every microsecond counts, these low-level approaches offer unparalleled speed.

4. BitSet Implementation

A BitSet is a memory-efficient data structure that uses bits as flags, providing a great balance of speed and usability.

private static final BitSet VOWELS = new BitSet(128);
static {
VOWELS.set('a'); VOWELS.set('e'); VOWELS.set('i'); VOWELS.set('o'); VOWELS.set('u');
VOWELS.set('A'); VOWELS.set('E'); VOWELS.set('I'); VOWELS.set('O'); VOWELS.set('U');
}
public boolean hasVowels(String text) {
for (var i = 0; i < text.length(); i++) {
if (VOWELS.get(text.charAt(i))) {
return true;
}
}
return false;
}
  • How it works: It uses a BitSet where each character's ASCII value corresponds to a bit. If the bit is 1 (true), it's a vowel.
  • Pros: Extremely fast O(1) lookup and very memory efficient. More readable than manual bit masking.
  • Cons: Primarily suited for character sets like ASCII where characters map cleanly to small integer indices.

5. Byte Mask Implementation

This is the most “metal” you can get. It uses raw bit-wise operations for the fastest possible check.

private static final long LOWERCASE_MASK = (1L << ('a' - 97)) | (1L << ('e' - 97)) |
(1L << ('i' - 97)) | (1L << ('o' - 97)) | (1L << ('u' - 97));
private static final long UPPERCASE_MASK = (1L << ('A' - 65)) | (1L << ('E' - 65)) |
(1L << ('I' - 65)) | (1L << ('O' - 65)) | (1L << ('U' - 65));
public boolean hasVowels(String text) {
for (var i = 0; i < text.length(); i++) {
var c = text.charAt(i);
if (c >= 'a' && c <= 'z') {
if ((LOWERCASE_MASK & (1L << (c - 97))) != 0) return true;
} else if (c >= 'A' && c <= 'Z') {
if ((UPPERCASE_MASK & (1L << (c - 65))) != 0) return true;
}
}
return false;
}
  • How it works: A long is used as a bit-mask where each bit represents a letter of the alphabet. A bit-wise AND operation checks if a character's corresponding bit is set.
  • Pros: The absolute fastest method. Minimal memory usage.
  • Cons: Very poor readability. This is “clever” code that is hard to maintain.

The Pragmatic Performer: Array-Based Lookup

This technique offers a fantastic compromise, delivering top-tier performance with code that’s still easy to follow.

6. Boolean Array

This creates a simple lookup table, using a character’s integer value as an index for a near-instant check.

private static final boolean[] VOWELS = new boolean[128];
static {
for (var c : new char[]{'a', 'e', 'i', 'o', 'u', 'A', 'E', 'I', 'O', 'U'}) {
VOWELS[c] = true;
}
}
public boolean hasVowels(String text) {
for (var i = 0; i < text.length(); i++) {
var c = text.charAt(i);
if (c < 128 && VOWELS[c]) {
return true;
}
}
return false;
}
  • How it works: An array of 128 boolean is created (for ASCII). The indices corresponding to vowel characters are set to true.
  • Pros: Blazing fast O(1) lookup. Conceptually simple and very effective.
  • Cons: Uses a fixed block of memory and is limited to the character set defined by the array size (e.g., ASCII).

Built-in String Operations

These methods rely on familiar String class methods.

7. String.contains()

private static final String VOWELS = "aeiouAEIOU";
public boolean hasVowels(String text) {
for (var vowel : VOWELS.toCharArray()) {
if (text.contains(String.valueOf(vowel))) {
return true;
}
}
return false;
}
  • How it works: It loops through each vowel and checks if the entire input string contains it.
  • Pros: Very easy to write and understand.
  • Cons: Extremely inefficient. Each call to .contains() can re-scan the entire input string.

8. String.replace()

A creative but impractical approach that demonstrates string manipulation.

public boolean hasVowels(String text) {
var current = text;
var originalLength = text.length();
for (char vowel : "aeiouAEIOU".toCharArray()) {
current = current.replace(String.valueOf(vowel), "");
if (current.length() != originalLength) return true;
}
return false;
}
  • How it works: It tries to replace each vowel with an empty string. If the string length changes at any point, a vowel must have been present.
  • Pros: A unique way to think about the problem.
  • Cons: Very inefficient due to the creation of new string objects in a loop.

The Power of Patterns: Regex-Based Implementations

Regular expressions are the ultimate tool for pattern matching in text.

9. Simple Regex find()

A concise and powerful way to solve the problem using a pattern.

private static final Pattern VOWEL_PATTERN = Pattern.compile("[aeiouAEIOU]");
public boolean hasVowels(String text) {
return VOWEL_PATTERN.matcher(text).find();
}
  • How it works: A Pattern object is compiled once to represent "any character in this set." The matcher(text).find() method then efficiently searches the string for the first occurrence.
  • Pros: Clean, highly readable, and easily extensible for more complex patterns.
  • Cons: Regex engines carry some overhead, so this may not be as fast as direct array or bit-set lookup for this simple task.

10. Regex replaceAll()

Another regex-based approach that works by removing all matches.

private static final String VOWELS_MATCH = "[aeiouAEIOU]";
public boolean hasVowels(String text) {
var withoutVowels = text.replaceAll(VOWELS_MATCH, "");
return withoutVowels.length() != text.length();
}
  • How it works: It replaces all vowels in the string with nothing and then checks if the string’s length has changed.
  • Pros: Declarative and easy to understand what it’s doing.
  • Cons: Inefficient because it processes the entire string and creates a new one, even if the first character is a vowel.

Benchmark

To objectively compare these Java vowel-checking methods, we’re using the Java Microbenchmark Harness (JMH).

@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@State(Scope.Benchmark)
@Fork(value = 2, warmups = 1)
@Warmup(iterations = 5, time = 1)
@Measurement(iterations = 10, time = 1)
public class VowelCheckerBenchmark {

The benchmarks were run on OpenJDK 25 (Temurin-25+36) with -Xms1g -Xmx1g -XX:+UseG1GC. It measures the average execution time per operation in nanoseconds, running in a shared benchmark state, with 2 forks (each doing 1 warmup fork), warming up for 5 iterations of 1 second each, and then measuring over 10 iterations of 1 second each.


| Method | withVowels (ns/op) | withoutVowels (ns/op) |
|------------------|--------------------|-----------------------|
| anyMatchContains | 30.301 | 858.955 |
| anyMatch | 12.123 | 490.683 |
| bitSet | 🟢 **1.481** | 64.707 |
| byteMask | 🟢 **1.297** | 69.118 |
| charArray | 🟢 **1.296** | 🟢 **54.632** |
| contains | 15.513 | 157.358 |
| loopIn | 8.972 | 211.650 |
| loopOr | 8.242 | 174.308 |
| nestedFor | 10.896 | 382.267 |
| recursion | 1.743 | 🔴 **1555.377** |
| regexReplace | 🔴 **428.105** | 118.510 |
| regex | 14.023 | 75.144 |
| stringReplace | 59.367 | 61.914 |

Wrapping up

Top performers (1–2 ns with vowels, ~55–70 ns without)

  • charArray → 1.296 ns (best)
  • byteMask → 1.297 ns
  • bitSet → 1.481 ns

Without vowels:

  • charArray → 54.6 ns (best)
  • bitSet → 64.7 ns
  • byteMask → 69.1 ns

The lookup-based methods (charArray, bitSet, byteMask) are clear winners:

  • Tiny cost when vowel found early.
  • Linear, but still very fast when scanning fully.

Ranking summary

✅ Best (production-ready, fastest, stable)

  • charArray (fastest, simplest, predictable)
  • bitSet (slightly slower, but elegant)
  • byteMask (comparable, just more obscure)

⚖️ Acceptable (slower but usable if clarity matters)

  • regex (≈14 ns / 75 ns)
  • contains (≈15 ns / 157 ns)

⚠️ Slower (avoid for hot paths)

  • loopIn, loopOr, nestedFor (hundreds of ns worst case)

🐌 Worst

  • recursion (unstable, explodes without vowels)
  • regexReplace (hundreds of ns, allocates)
  • stringReplace (~60 ns always, wasteful)

Takeaways

  • The JVM is very good at optimizing simple array/bit-set lookup.
  • Low-Level is King for Speed: For a simple task like vowel checking, direct array lookup (charArray) or bit-fiddling (byteMask, bitSet) are the most efficient.
  • If you want a balance of readability + acceptable performance: regex or contains.
  • Streams and Recursion are Costly

You can find all the code on GitHub.

Originally posted on marconak-matej.medium.com.