Design a system that finds the most searched keyword on Google. This API will be an internal part of the Google business analyzers. ## Google Keywords by Location and Timestamps | Keyword | Country | Timestamps | | --- | --- | --- | | … | … | … | | Twitter | USA | 12.00.00.0001 | | Ankara | TR | 12.00.01.0001 | | Real Madrid | ESP | 12.00.01.0010 | | … | … | … | Let’s say we are on the edge of a serious event. A Google business analyzer may need information like “most search keyword between 9 pm and 10 pm”. We will provide it with our API. # **Functional Requirements** - The API is ```python topSearchedCountry(startTime, endTime) // topSearchedKeyword(startTime, endTime) ``` We want our system to return the most frequent country. And because result changes over time, we also need to provide a time interval, and more specifically, the start and end time of that interval. Column stores for which columns we want to do analysis. ## Example - `topSearched(Country, 12.00, 13.00)` means that, bring the country that do the most search between 12.00 and 13.00. - Similarly, `topSearched(Keyword, 12.00, 13.00)` means that, bring the most searched keyword between 12.00 and 13.00 all over the world. ## Constraints - The smallest time interval is 30 mins, longest is 1 day. Each time interval queried is a multiple of 30 minutes. - As an example we can run a query `topSearched(Country, 12.00, 13.00)` is valid, However, `topSearched(Country, 12.10, 13.10)` is invalid. # ****Requirements**** - **Scalable:** Since it is Google, exploit concurrent computations. - **Highly Performant:** a few tens of milliseconds to return to the top used entry. Performance requirements should give you a hint that the final top used entry should be pre-calculated and we should avoid heavy calculations while calling the topSearched API. - **Highly available:** Make data available in case of hardware failures or network partitions. - **Accuracy:** no downsampling, but can be *approximate.* ## How would you serve this application?