Design a system that finds 100 of the most searched entries on Google. This API will be an internal part of the Google business analyzers. Assume at google you receive the following streaming table ## Google Keywords by Location and Timestamps | Keyword | Country | Timestamps | | --- | --- | --- | | … | … | … | | Twitter | USA | 12.00.00.0001 | | Ankara | TR | 12.00.01.0001 | | Real Madrid | ESP | 12.00.01.0010 | | … | … | … | Let’s say we are on the edge of a serious event. A Google business analyzer may need information like “most search 100 keywords between 9 pm and 10 pm”. We will provide it with our API. # **Functional Requirements** - The API is ```python topKCountry(k, startTime, endTime) ``` We want our system to return a list of k most frequent keywords. And because this list changes over time, we also need to provide a time interval, and more specifically, the start and end time of that interval. Column stores for which columns we want to do analysis. ## Example - `topKCountry(2, 12.00, 13.00)` means that, bring the top 2 countries that do the most search between 12.00 and 13.00. - Similarly, `topKCountry(2, 12.00, 13.00)` means that, bring the top 2 keywords searched between 12.00 and 13.00 all over the world. ## Constraints - The smallest time interval is 30 mins, longest is 1 day. Each time interval queried is a multiple of 30 minutes. - As an example we can run a query `topKCountry(2, 12.00, 13.00)` is valid, However, `topKCountry(2, 12.10, 13.10)` is invalid. - k≤100 # ****Requirements**** - **Scalable:** Since it is Google, exploit concurrent computations. - **Highly Performant:** a few tens of milliseconds to return to the top 100 list. Performance requirements should give you a hint that the final top k list should be pre-calculated and we should avoid heavy calculations while calling the topK API. - **Highly available:** Make data available in case of hardware failures or network partitions. - **Accuracy:** no downsampling, result can be approximate.