A thesaurus is not a species of dinosaur

Maharera Project Names Wordcloud

November 05, 2019

While I was browsing the Maharera Website map and looking at the map, I thought of aggregating the data of all Maharera projects for data analysis.

Using Fiddler ,I copied the data and went about creating a Python program to do the same

import json
import string

names=[]

We read the json file and append the names found against NameofProject into names array.We split and then remove all punctuation in the code block

with open('C:\\Users\\Narayan\\Desktop\\maharera data\\data.json') as json_file:
data = json.load(json_file)
for element in data :
    for a  in element['Name_of_Project'].split():
        word1 =a.translate(str.maketrans('', '', string.punctuation)).replace("'","")
        names.append(word1)

This section of the code is not really necessary but was useful for me to exclude words , to add in the stopwords list

from collections import Counter 
#print(data)
Counter = Counter(names) 


most_occur = Counter.most_common(10) 

Now, we import wordcloud package and add stopwords( these words are excluded from wordcloud). The stopwords already contains common words ,I’ve added words as per convenience

from wordcloud import WordCloud, STOPWORDS
import matplotlib.pyplot as plt
stopwords = set(STOPWORDS)
new_stopwords=stopwords.union(["PHASE","PHASEII","COMMERCIAL","COOPERATIVE","Phase","THE",'at',"APARTMENT","RESIDENCY",
                                "PLAZA","COMPLEX","TOWER","HOUSING","SOCIETY","RESIDENCY'"
                                ,"HEIGHTS","BUILDING","CHS","CITY","WING","CHSL","Heights","1",'2','II','A','I',"GARDEN",
                            "PARK","TOWERS","BLDG","VILLA","ARCADE","Redevelopment","PROJECT","APARTMENTS","ENCLAVE","LTD",
                            "HOME","ESTATE","HEIGHT","VIEW","FLOOR","III","NEW",'HOMES','AVENUE','NAGAR',"RESIDENCE"])

def show_wordcloud(data, title = None):
    wordcloud = WordCloud(  
    background_color='white',
    stopwords=new_stopwords,
    max_words=800,
    max_font_size=40, 
    scale=3,
    
    random_state=1 # chosen at random by flipping a coin; it was heads
    ).generate(" ".join(data))

fig = plt.figure(1, figsize=(12, 12),dpi=150)
plt.axis('off')
if title: 
    fig.suptitle(title, fontsize=20)
    fig.subplots_adjust(top=2.3)

plt.imshow(wordcloud)
plt.show()

show_wordcloud(names)

The output is as shown

wordcloud As expected , Sai tops

Feel Free to use the code for further analysis


Narayanan IyerNarayanan Iyer

Written by Narayanan Iyer who lives and works in Mumbai. Full time R and shiny enthusiast , he spends way too much time on HN. Fluent in R,shiny, docker, Python, Javascript and C# and 6 other human languages.

You can contact him at narayanan iyer 22 at gmail dot com