Is Text Enhancement Useful for Comprehensible Input?

Ben Adams

•

December 31, 2025

•

comprehensible input

I've been using (and making) comprehensible input for a few years now in many different languages.

Recently, I've come across a specific trend in the videos I'm watching to learn Chinese.

Several creators have started putting a little counter in the bottom right and then highlighting that number of (potentially) new vocabulary throughout the video.

It looks like this:

Notice the 2/42 in the bottom right and the lājī / 垃圾 in the top left. The other stuff on screen is from the metalayer I use.

As far as I know, Zhangkai Chinese started this specific trend/layout in Chinese videos, but I've seen similar things in pretty much all comprehensible input in various languages.

Example from Slow Czech

Example from Chinese at Dawn

Example from French Comprehensible Input

What is this technique?

This teaching technique is a form of "input enhancement,"¹ but that's a very broad term. Input enhancement can refer to ANYTHING where the aim is to draw attention to the language. Even subtitles are a kind of input enhancement since they draw attention to the writing system, help users parse words, etc.

More specifically, this is a variant of "textual enhancement" which is where text is modified to make certain things stand out.

Bolding words is a simple and clear example:

Ted needs to go to the store and buy milk.

I don't think there's a specific word for this when it's used in primarily spoken content (such as a video). Calling it a variant of textual enhancement makes sense to me, so that's what I'll continue to call it.

Is textual enhancement useful?

This is the more important question! Is all this effort that the creators go through actually beneficial to the learner? Zhangkai Chinese added 42 different text enhancements to that video, which must have taken dozens of minutes!

The idea to add the enhancement is tied to the Noticing Hypothesis², which is the idea that in order for input to "stick," a learner must notice it. You can't learn while you're asleep or distracted and focusing on a specific thing (such as a word or grammar form) will help you acquire.

It's unclear whether noticing is required for acquisition, or just beneficial, but in our experience, it's incredibly valuable and important. So even if it's not required, you should still try to do it.

However, we generally recommend that the LEARNER should be the one to notice things. That it's more beneficial if you use your mind and focus to look for words you're learning or familiar with.

This kind of textual enhancement comes from the TEACHER (or source of input). Is that actually beneficial?

What does research say?

Cintrón-Valentín and García-Amaya did some tests⁵ on Spanish learners to see if adding enhancement to subtitles would improve their learning of grammar and/or vocabulary.

The Spanish learners received "traditional" lessons and custom-created videos designed to include the target grammar and vocabulary.

They made 4 groups of learners:

Lessons + videos (no subtitles)
Lessons + videos (subtitles with highlighted vocab)
Lessons + videos (subtitles with highlighted grammar)
Videos (subtitles with highlighted grammar) no lessons

The results were rather interesting! Immediately after the lessons and videos, the learners who had the enhancement did better in that area (learners with highlighted vocab retained more of the new vocab). But that advantage disappeared after a few weeks.

Highlighting things does draw students attention to them, but it doesn't necessarily get acquired more quickly. Continued exposure to the word or concept is required.

Another study³ actually tracked learners eyes to see how long they looked at the underlined words. It found that learners did look at the highlighted words for longer (as compared to normal, unenhanced words in the subtitles). But also that there was a positive relationship between how long someone looked at a word and the likelihood they remembered that word on an immediate posttest.

It's also worth noting that the authors make the comment that it's hard to say definitively that the underlining caused more noticing which in turn caused better recall.

But the issue of the benefits vanishing after a week or two wasn't addressed.

An earlier study⁴ also based on eye tracking (carried out by some of the same researchers) tested something different: are full subtitles or "keyword captions" more effective?

Keyword captions are when just a target word/phrase is put on screen, exactly like the examples I have above.

First off, the scores of their learners were overall very low. Not a lot of word learning happened, regardless of the group. However, the keyword captions groups were quite a bit better at recognizing the words that were being taught.

Not understanding the words, just recognizing them. Which makes sense, since those were specifically shown on screen.

However, actually learning the meaning of the words didn't improve with just the keyword captions when compared to the full subtitles. The "intention" of the learner was much more important (they only told 1/2 of the participants they would be tested and everyone who knew that did better on the test, regardless of their subtitle type).

All this is to say that there might be some benefit to including textual enhancement in the form of keyword captions, but it's certainly not an obvious benefit.

It's worth doing more research into variations of this technique, but learning words should probably be done with different activities.

What do my vibes say?

I find that the biggest issue with the textual enhancement on these videos is that they're static. They're part of the video and cannot adapt to the learner. For example, I already know 垃圾 (it means garbage) and so that callout wasn't useful. So it just takes up space.

Or, and this happened to me yesterday, the creator guesses wrong! There was a sentence that said something like "And there's direct access to the subway." It was actually the perfect sentence for me to learn a word, since I knew all but 1 of the words: direct.

But the creator put the word "subway" on the screen.

This didn't actually affect me in any way, since I use other tools to look up words and mostly ignore the text enhancements. They're just not that useful to me.

The final problem with some enhancements is when they include a translation or, in the case of Mandarin, the Pinyin. In the example from Chinese with Dawn above, she includes the Pinyin, character and English meaning. This is useful if:

You want to see Pinyin
You can read the font for the character
You speak English

But if you are trying to get used to reading just characters, or they choose a bad font or you don't actually know English (or don't want to use the translations without trying to understand first) then this text enhancement isn't very useful! In fact, it might negatively affect your use of the content for learning.

In a similar vein, the Czech example has the word "instrumental," drawing attention to the case (the callout means "to heat with wood"). What if you don't know what instrumental is? Or learned a different name for the case?

Should you add text enhancement to your videos?

In my opinion, no. It's a decent amount of extra work for very little actual benefit. You're basically just guessing which words or concepts your viewers should focus on.

HOWEVER, I do think there's a specific version of this that is very useful.

Targeted Videos

If you make a video that's focusing on a specific topic or concept or something like that, then drawing attention to that topic or concept can be VERY useful.

Using English as an example, let's say you wanted to make a video about the difference between present simple (I eat) and present continuous (I'm eating). In a video like that, it'd be very useful to put examples of those grammar patterns on screen.

Every day I eat breakfast at 9:15, after I come back from the gym. Right now, I'm eating oatmeal and drinking coffee.

Something like that.

Conclusion

At the end of the day, however, it's not a problem to include this kind of textual input. Some people might find it engaging, others can ignore it.

I think, ideally, there'd be a smart, personalized system or extension which could draw your attention to specific words or grammar patterns when YOU need them most...

What do you think? I'd love to hear your thoughts! Let me know via email or join our Discord and ping @mycheze.

Footnotes

^[1]Sharwood Smith, M. (1993). Input enhancement in instructed SLA: Theoretical bases. Studies in Second Language Acquisition, 15(2), 165–179↩
^[2]Schmidt, R. (1990). "The role of consciousness in second language learning." Applied Linguistics, 11(2), 129-158.↩
^[3]Puimège, E., Montero Perez, M., & Peters, E. (2023). Promoting L2 acquisition of multiword units through textually enhanced audiovisual input: An eye-tracking study. Second Language Research, 39(2), 471–492.↩
^[4]Montero Perez, M., Peters, E., & Desmet, P. (2015). Enhancing vocabulary learning through captioned video: An eye-tracking study. The Modern Language Journal, 99(2), 308–328.↩
^[5]Cintrón-Valentín, M. C., & García-Amaya, L. (2021). Investigating textual enhancement and captions in L2 grammar and vocabulary: An experimental study. Studies in Second Language Acquisition, 43(5), 1068–1093.↩