So, I didn’t use requests package. I did download it, tried some things with it, but it didn’t help and I didn’t get it.
Although I did find an alternative to it – urllib (since I’m on Python 3.6).
# Use the BeautifulSoup and requests Python packages # to print out a list of all the article titles # on the New York Times homepage. import string from bs4 import BeautifulSoup from urllib.request import urlopen testurl = 'https://www.nytimes.com/' soup = BeautifulSoup(urlopen(testurl), 'lxml') print(soup.prettify()) newheadings = soup.find_all(class_="story-heading").__str__() test = str.strip(newheadings, '</a>') print(test)
I’ll try it out using requests library.
- New libraries – BeautifulSoup, Requests, urllib
- BeautifulSoup and Requests are external libraries which needs be installed using pip/easy_install.
- Conversion of ResultSet to String which is done by this LOC:
- Using strip – which is used to strip of characters or blank spaces before or after (check this link) – does not work for me. Need to seriously figure this out.
- Copy-pasting your own running code, ends up not working. Re-write the whole thing which clear head as to what you want. Since I’m still learning and still get to know a lot of things, but by knowing what you want your code to do will make it easy. [Enough philosophy!]
- NY Times has a very shabby page though. No offence to the designers as well.
- This was tough. Seriously.
- Prettify – Shows your page in an XML format which is pretty cool.
- Oh and here’s the link to see the difference. Sorry. I really need to study more.