the discovery of a complete training set

Write an algorithm (or two algorithms for a pair) in plain English to show your approach for the discovery of a complete training set for 50 topics and the corresponding 50 datasets (Training101 to Training 150). Your approach should be generic that means it is feasible to be used for all (or most) topics. For each topic, e.g., Topic102, you should use the following input and generate the output.

Inputs: query Q = a topic (you may use title e.g., ‘Convicts repeat offenders’ or all information including the and ); and U = folder “Traning102”.

Output: D = D+ È D- , where D+ Ç D- = Æ and D Í U. The following is the possible outputs in D (not the answer) for topic 102:

R102 73038 1

R102 26061 1

R102 65414 1

R102 57914 1

R102 58476 1

R102 76635 1

R102 12769 1

R102 12767 1

R102 25096 1

R102 78836 1

R102 82227 1

R102 26611 1

R102 15200 0

R102 13320 0

R102 54745 0

R102 15082 0

R102 53523 0

R102 65306 0

R102 68419 0

R102 29920 0

R102 30456 0

R102 75563 0

R102 28657 0

R102 65394 0

R102 85372 0

Q2) (6 marks) Implement the algorithm (two algorithms for a pair) by using Python. You also need to discuss the output to justify why the proposed algorithm likely generates high quality training sets. You may use figures to show the justification.

