Maharera Project Names Wordcloud
November 05, 2019
While I was browsing the Maharera Website map and looking at the map, I thought of aggregating the data of all Maharera projects for data analysis.
Using Fiddler ,I copied the data and went about creating a Python program to do the same
import json
import string
names=[]
We read the json file and append the names found against NameofProject into names array.We split and then remove all punctuation in the code block
with open('C:\\Users\\Narayan\\Desktop\\maharera data\\data.json') as json_file:
data = json.load(json_file)
for element in data :
for a in element['Name_of_Project'].split():
word1 =a.translate(str.maketrans('', '', string.punctuation)).replace("'","")
names.append(word1)
This section of the code is not really necessary but was useful for me to exclude words , to add in the stopwords list
from collections import Counter
#print(data)
Counter = Counter(names)
most_occur = Counter.most_common(10)
Now, we import wordcloud package and add stopwords( these words are excluded from wordcloud). The stopwords already contains common words ,I’ve added words as per convenience
from wordcloud import WordCloud, STOPWORDS
import matplotlib.pyplot as plt
stopwords = set(STOPWORDS)
new_stopwords=stopwords.union(["PHASE","PHASEII","COMMERCIAL","COOPERATIVE","Phase","THE",'at',"APARTMENT","RESIDENCY",
"PLAZA","COMPLEX","TOWER","HOUSING","SOCIETY","RESIDENCY'"
,"HEIGHTS","BUILDING","CHS","CITY","WING","CHSL","Heights","1",'2','II','A','I',"GARDEN",
"PARK","TOWERS","BLDG","VILLA","ARCADE","Redevelopment","PROJECT","APARTMENTS","ENCLAVE","LTD",
"HOME","ESTATE","HEIGHT","VIEW","FLOOR","III","NEW",'HOMES','AVENUE','NAGAR',"RESIDENCE"])
def show_wordcloud(data, title = None):
wordcloud = WordCloud(
background_color='white',
stopwords=new_stopwords,
max_words=800,
max_font_size=40,
scale=3,
random_state=1 # chosen at random by flipping a coin; it was heads
).generate(" ".join(data))
fig = plt.figure(1, figsize=(12, 12),dpi=150)
plt.axis('off')
if title:
fig.suptitle(title, fontsize=20)
fig.subplots_adjust(top=2.3)
plt.imshow(wordcloud)
plt.show()
show_wordcloud(names)
The output is as shown
Feel Free to use the code for further analysis