python - Split Text into paragraphs NLTK - usage of nltk.tokenize.texttiling? -
I was looking at ways to divide documents into paragraphs and I was told a possible way of doing this.
Here's my attempt to use it, however, I do not understand how to work with output. I appreciate your help
t = unidecode (doclist [0] .decode ('utf-8', 'ignore')) nltk.tokenize.texttiling.TextTilingTokenizer (t) < / Code>
Output:
& lt; 0x11e9c6350 on nltk.tokenize.texttiling.TextTilingTokenizer & gt;
I'm just hanging out with this one for the same reason and so on There was a question that you did not do so that it is wrong. I liked the best to pass on what I know ... :)
I am not sure yet, but I would like to illustrate the use of TextTilingTokenizer in the bug report:
alice = nltk.corpus.gutenberg.raw ('carroll-alice.txt') tttt = nltk.tokenize.TextTilingTokenizer () tiles = tt.tokenize (alice [14030 9:])
It appears that you want to feed your text to the tokenize method on TextTilingTokenizer
Comments
Post a Comment