So, I didn’t use requests package. I did download it, tried somethings with it, but it didn’t help and I didn’t get it.

Although I did find an alternative to it – urllib (since I’m on Python 3.6).

# Use the BeautifulSoup and requests Python packages
# to print out a list of all the article titles
# on the New York Times homepage.

import string
from bs4 import BeautifulSoup
from urllib.request import urlopen

testurl = 'https://www.nytimes.com/'
soup = BeautifulSoup(urlopen(testurl), 'lxml')
print(soup.prettify())
newheadings = soup.find_all(class_="story-heading").__str__()
test = str.strip(newheadings, '</a>')
print(test)

I’ll try it out using requests library.

Lessons Learned:

  1. New libraries – BeautifulSoup, Requests, urllib
  2. BeautifulSoup and Requests are external libraries which needs be installed using pip/easy_install.
  3. Conversion of ResultSet to String which is done by this LOC:

     soup.find_all(class_=”story-heading”).__str__()

  4. Using strip – which is used to strip of characters or blank spaces before or after (check this link) – does not work for me. Need to seriously figure this out.
  5. Copy-pasting your own running code, ends up not working. Re-write the whole thing which clear head as to what you want. Since I’m still learning and still get to know a lot of things, but by knowing what you want your code to do will make it easy. [Enough philosophy!]
  6. NY Times has a very shabby page though. No offence though to the designers as well.
  7. This was tough. Seriously.
  8. Prettify – Shows your page in an XML format which is pretty cool.
  9. Oh and here’s the link to see the difference. Sorry. I really need to study more.
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s