python - Split Text into paragraphs NLTK - usage of nltk.tokenize.texttiling? -


I was looking at ways to divide documents into paragraphs and I was told a possible way of doing this.

Here's my attempt to use it, however, I do not understand how to work with output. I appreciate your help

  t = unidecode (doclist [0] .decode ('utf-8', 'ignore')) nltk.tokenize.texttiling.TextTilingTokenizer (t) < / Code> 

Output:

& lt; 0x11e9c6350 on nltk.tokenize.texttiling.TextTilingTokenizer & gt;

I'm just hanging out with this one for the same reason and so on There was a question that you did not do so that it is wrong. I liked the best to pass on what I know ... :)

I am not sure yet, but I would like to illustrate the use of TextTilingTokenizer in the bug report:

  alice = nltk.corpus.gutenberg.raw ('carroll-alice.txt') tttt = nltk.tokenize.TextTilingTokenizer () tiles = tt.tokenize (alice [14030 9:])  

It appears that you want to feed your text to the tokenize method on TextTilingTokenizer


Comments

Popular posts from this blog

Member with no value in F# -

java - Joda Time Interval Not returning what I expect -

c# - Showing a SelectedItem's Property -