i have got a little complicated problem. I am making a web application using php/mysql that would allow a user to search through the data efficiently just like how google works and give most relevant results.
basically what I want is a scalable solution preferably as a REST service that would spit out a structured XML when you pass a query parameter to it. but the main problem is the kind of data I have. Each dataset contains XML file for every individual record and has its own schema.
the datasets contain around 200,000 records for now and in the future it can scale to a million.
so couple of questions:
how best to map this data so that a keyword search always retrieves the most relevant entries and not just a sequential bunch of entries
i tried using apache solr but couldn't figure out if the schema in apache solr is flexible enough to allow me to enter any fields corresponding to the XML I have and define their type. but looks like there has to be a certain schema syntax that has to be followed.
the most important thing it to make this database system scalable so that different types of metadata datasets can be imported in a plug-n-play fashion and it gets indexed appropriately and available for searching. I think this can only be possible if a mapping schema can be developed but I am open to any other ideas.
I have attached the two sample dataset files.