Posts

Showing posts from June, 2021

Scraping entire text & keywords from the webpage with help of newspaper & nltk || Python

Image
 I'm using the newspaper and nltk library for scraping, summarizing & converting articles from a webpage to a text file.  Here the tokenizer "punkt" is used for  splitting a phrase, sentence, paragraph,  into smaller units, such as individual words or terms. #importing libraries from  newspaper  import  Article import  nltk #create tokenizer nltk.download( 'punkt' ) #input-website and create object for article url=  'https://www.marketwatch.com/' article = Article(url, language= "en" ) #downloading/parsing/npl the article article.download() article.parse() article.nlp() #printing the scraped>processed data print ( "Article Title:" )  print (article.title)  #prints the title of the article print ( "\n" )  print ( "Article Text:" )  print (article.text)  #prints the entire text of...