Count and index the word occurrences in a given text

In the binary file Article, there are some texts. Count the number of appearance and create a index for each word in these texts. Save the result as a binary file Article_index, and find out the texts in which both will and about appear.

Solution:

A B C D E
1 =file("Article") =create(Word,Count,Index)    
2 =A1.cursor@b()      
3 for A2 =A3.Text =len(B3)  
4   =C3.(mid(B3,#,1)) =B4.(if(isalpha(~),lower(~)," "))  
5   =C4.select(~!=" "||~[-1]!=" ") =B5.conj@s().array(" ")  
6   =C5.groups(string(~):Word;count(~):Count) =B6.len()  
7   =B1.len() >i1=min(1,B7) >i2=1
8   for i2<=C6 =B1.m(i1) =B6(i2)
9     =cmp(C8.Word,D8.Word)  
10     if i1==0||C9>0 >B1.insert(i1,D8.Word,D8.Count,[#A3])
11       =i2=i2+1  
12       >i1=if(i1>0 && i1<B7,i1+1,0)  
13     else if C9<0 >i1=if(i1>0 && i1<B7,i1+1,0)
14     else >B1.modify(i1,Count+D8.Count:Count,Index|#A3:Index)
15       >i2=i2+1
16       >i1=if(i1>0&&i1<B7,i1+1,0)  
17 =file("Article_index") >A17.export@b(B1)    

According to the problem statement, the memory is not limited. Generally speaking, the number of English words is limited. So, to summarize, use the TSeq to store the index table directly. With the binary index file Article_index, we can find the related texts with one or more key words:

A B C D
1 =file("Article") =file("Article_index")  
2 =B1.import@b() [about,will]  
3 =A2.(Word).pos(B2) =A2(A3).(Index).isect()  
4 =A1.cursor@b() [] 0
5 for B3 >A4.skip(A5-C4-1) >C4=A5
6   >B4=B4|A4.fetch(1)  
7 >A4.close()    

The sorted index TSeq is as follows:

The final result is as follows: