MPOB’s Malaysian crude palm oil production data. Another fetching thing with Python and BeautifulSoup

We needed up-to-date Malaysian crude palm oil production data. The MPOB provides such data on its website. It remained to write some ugly but somewhat reliable snippet code in Python, the latter doing a lot with only few lines of codes. We used the additional library BeautifulSoup we already used here and here.

The one who need an additional year of data (that is 2005) can easily modify the original piece of code to get them from the MPOB’s website.

The data are output to screen so that they are easily copied-pasted in MS Excel (kindda *.csv format).

#####################################################
# Edouard TALLENT @TaGoMa.Tech                      #
# Scraping MPOB's data on CPO                       #
# Website http://bepi.mpob.gov.my/                  #
# QuantCorner @ http://quantcorner.wordpress.com    #
# April, 2014                                       #
#####################################################

# Required headers
import urllib2                          # Read webpages
from bs4 import BeautifulSoup           # bs4 fonctions
import re                               # Regex, removing those bloody kommas

# Arrays that will contain the desired data
months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul',  'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
data = []
dates = []

# URLs
oldest = 2006 # Data are available back to 2006
newest = 2014 # Most recent year for the data

for urls in range (oldest, newest + 1):
    # construct the URLs strings
    url = 'http://bepi.mpob.gov.my/stat/web_report1.php?val=' + str(urls) + '44'

    # Read the HTML page content
    page = urllib2.urlopen(url)

    # Create a beautifulsoup object
    soup = BeautifulSoup(page)

    # Search the table to be parsed in the whole HTML code
    tables = soup.findAll('table')
         
    for y in range (2, 16):
        m = 0
        for table in range(0, 2):
            for x in range(2, 13, 2):
                data.append(re.sub(',','', tables[table].findAll('tr')[y].findAll('td')[x].string))

# Construct the date series (%b-%Y format)
for year in range(oldest, newest + 1):
    for month in months:
        dates.append(str(month) + '-' + str(year)) # Construct the %b-%Y series

# Rearrange the data
ind = 0
cnt = 0
arr = [[],[],[],[],[],[],[],[],[],[],[],[],[],[]]

for dat in range(0, len(data)):
    arr[ind].append(data[dat])
    cnt += 1
    if (dat + 1)%12 == 0: 
        ind += 1
    if ind > 13:
        ind = 0

# Print out to screen
print 'date,johor,kedah,kelantan,negeri_sembilan,pahang,perak,selangor,terennganu,\
other_states,p_malaysia,sabah,sarawak,sabah_sarawak,malaysia'
for x in range (0, len(arr[0])):
    print str(dates[x]) + ',' + arr[0][x] + ',' + arr[1][x] + ',' + arr[2][x] + ',' + arr[3][x] + ',' + \
    arr[4][x] + ',' + arr[5][x] + ',' + arr[6][x] + ',' + arr[7][x] + ',' + arr[8][x] + ',' + \
    arr[9][x] + ',' + arr[10][x] + ',' + arr[11][x] + ',' + arr[12][x] + ',' + arr[13][x]

'''
# Output
date,johor,kedah,kelantan,negeri_sembilan,pahang,perak,selangor,terennganu,other_states,p_malaysia,sabah,sarawak,sabah_sarawak,malaysia
Jan-2006,145637,10794,12146,22321,108659,100251,34323,21896,8353,464380,376082,96131,472213,936593
Feb-2006,193908,16102,14455,33604,137622,130834,43014,23719,12684,605942,346248,99714,445962,1051904
Mar-2006,229324,20869,18202,40836,165481,143827,52195,28503,14518,713755,420011,109438,529449,1243204
Apr-2006,240346,19849,23569,42401,205117,145081,55438,30422,13407,775630,429013,106163,535176,1310806
May-2006,267948,20124,23364,45848,217454,149685,56463,31015,14017,825918,445565,119864,565429,1391347
Jun-2006,260255,19526,19727,44352,205540,151136,52726,27236,14609,795107,416951,116339,533290,1328397
Jul-2006,265266,21785,19703,44032,217229,167330,54751,35381,14875,840352,408907,125802,534709,1375061
Aug-2006,290846,23654,21271,49627,230885,181439,62253,41700,17281,918956,465689,152591,618280,1537236
Sep-2006,279002,23203,21712,50783,229117,175558,62463,46210,17141,905189,543770,161081,704851,1610040
Oct-2006,226696,16678,17696,40735,195544,130907,48347,38468,13593,728664,526143,145329,671472,1400136
Nov-2006,260429,17528,23801,45404,227029,137138,51475,48260,14099,825163,575674,150847,726521,1551684
Dec-2006,161383,11852,17637,31411,153225,111286,42045,33883,10269,572991,451564,119823,571387,1144378
Jan-2007,160215,14767,14828,30328,142886,129135,43464,29282,11510,576415,423055,115870,538925,1115340
Feb-2007,148427,15210,14605,28648,123048,106760,39234,24622,9437,509991,384933,94757,479690,989681
Mar-2007,172007,19307,17257,32485,144310,128407,47182,26232,11623,598810,381805,100615,482420,1081230
Apr-2007,182575,16421,19323,31683,152799,121641,45247,30267,10262,610218,402620,112888,515508,1125726
May-2007,196112,17660,19372,32135,161753,127367,46138,33333,10598,644468,439441,117346,556787,1201255
Jun-2007,191218,19496,16468,33263,154373,135804,45502,34068,11278,641470,409739,114975,524714,1166184
Jul-2007,244877,23439,18098,40959,193628,160383,51290,40239,13903,786816,427021,143119,570140,1356956
Aug-2007,268332,27097,19697,42435,216004,177592,52216,49330,16394,869097,521197,169068,690265,1559362
Sep-2007,270622,26564,21348,45992,223257,163807,48900,48002,17166,865658,557565,178842,736407,1602065
Oct-2007,270185,25592,23396,51687,229405,160389,50859,44821,16521,872855,531151,175803,706954,1579809
Nov-2007,276955,23066,26036,53254,247641,164997,53691,47910,15573,909123,569474,171407,740881,1650004
Dec-2007,243713,18223,16309,47920,169698,143149,50706,39059,13304,742081,513484,140569,654053,1396134
Jan-2008,241662,19066,20625,48855,209417,149653,49924,34319,14274,787795,490953,145497,636450,1424245
Feb-2008,217673,19355,18066,46039,187061,145488,52948,30397,13147,730174,383294,114501,497795,1227969
Mar-2008,229253,20678,20575,49429,211295,153312,54517,35216,13085,787360,389709,117641,507350,1294710
Apr-2008,209989,19347,24338,47886,211141,146403,52692,35832,11063,758691,436896,132004,568900,1327591
May-2008,229763,23357,23929,48542,226535,167582,60189,38207,13160,831264,488106,138508,626614,1457878
Jun-2008,231758,25969,23427,50182,230165,169072,59657,38862,13241,842333,483471,143117,626588,1468921
Jul-2008,252110,29938,26804,48504,250049,179553,61294,44459,13990,906701,489868,163646,653514,1560215
Aug-2008,258638,32697,24101,50775,250929,183528,60788,49427,16195,927078,495537,177599,673136,1600214
Sep-2008,248514,30128,23211,50141,236945,165362,57031,49816,15405,876553,516808,186081,702889,1579442
Oct-2008,269094,27411,26461,51667,252795,156000,54178,52906,15881,906393,552970,192708,745678,1652071
Nov-2008,281432,28376,24658,53610,251229,171734,59589,52551,16578,939757,530532,188128,718660,1658417
Dec-2008,251442,24899,19721,46834,232824,152745,51171,40638,15267,835541,482286,164942,647228,1482769
Jan-2009,221711,21539,17114,46722,197604,136186,50310,30962,14674,736822,453219,140154,593373,1330195
Feb-2009,208299,22419,15491,46125,186074,139935,49157,29079,15097,711676,351042,124663,475705,1187381
Mar-2009,206430,21674,18824,44454,202189,147382,53431,33316,15018,742718,396228,136876,533104,1275822
Apr-2009,214957,21735,21204,46131,197749,151318,54598,32429,16104,756225,386326,139301,525627,1281852
May-2009,244809,23552,20013,49080,214064,170341,56732,34952,17135,830678,408480,156117,564597,1395275
Jun-2009,260523,28623,20011,51771,222622,183990,60788,34059,17609,879996,414955,152975,567930,1447926
Jul-2009,273706,33855,21558,54169,244287,197952,62686,39510,18137,945860,396122,150976,547098,1492958
Aug-2009,247842,29435,22710,51071,247657,175623,54262,43937,16660,889197,429902,176974,606876,1496073
Sep-2009,244425,30111,22279,51306,241674,163377,54240,45589,16711,869712,492832,195220,688052,1557764
Oct-2009,326184,32223,31658,60130,316026,198617,64613,63202,21073,1113726,628181,242130,870311,1984037
Nov-2009,253616,23151,20157,43381,240878,158344,52733,44342,14808,851410,546572,197610,744182,1595592
Dec-2009,235343,22691,19553,41983,209636,157820,50519,38898,16001,792444,545835,181784,727619,1520063
Jan-2010,209505,22543,17528,40280,179713,132648,44014,33234,13797,693262,470755,157227,627982,1321244
Feb-2010,193756,21098,18092,39549,168491,117401,40910,30409,13066,642772,396728,117314,514042,1156814
Mar-2010,232430,24622,22181,45877,211511,143793,47915,37747,14875,780951,453627,152656,606283,1387234
Apr-2010,217338,22800,20623,40692,195696,128818,47505,34300,15365,723137,427784,155307,583091,1306228
May-2010,245879,23848,20023,43798,203653,149253,51035,33791,17268,788548,428519,168787,597306,1385854
Jun-2010,258908,27128,19962,47670,207769,172301,54728,39371,18893,846730,405204,167922,573126,1419856
Jul-2010,264492,27558,21870,48164,233444,181426,58903,48116,20792,904765,420933,193198,614131,1518896
Aug-2010,255194,26601,21365,47416,253527,173306,55776,49194,20029,902408,479973,224039,704012,1606420
Sep-2010,245335,23519,19917,48244,245623,165085,53527,46615,18925,866790,472983,223139,696122,1562912
Oct-2010,266321,23083,20168,52066,256693,156394,50683,51491,18960,895859,507820,232807,740627,1636486
Nov-2010,232593,17292,18171,45730,227242,153731,47222,47579,16153,805713,450794,202514,653308,1459021
Dec-2010,186448,17446,14964,36017,165070,133827,41827,36947,14639,647185,400876,184691,585567,1232752
Jan-2011,169897,18647,9861,37708,127784,129893,39819,24506,14680,572795,324374,160748,485122,1057917
Feb-2011,172278,20931,11180,38784,147446,125859,39922,26804,15489,598693,339035,156580,495615,1094308
Mar-2011,229479,26882,14791,51008,201484,163683,51184,33033,20327,791871,434539,189960,624499,1416370
Apr-2011,242911,26164,18255,54459,230795,159048,52970,32126,21001,837729,479086,213136,692222,1529951
May-2011,284842,27681,24396,59950,268674,177939,57193,44652,21531,966858,542987,231951,774938,1741796
Jun-2011,285567,27107,23133,58486,261930,178322,55672,43989,21557,955763,565368,232006,797374,1753137
Jul-2011,278396,29855,25128,56698,264986,181502,56096,51802,20450,964913,547687,238588,786275,1751188
Aug-2011,268496,28666,24359,54635,249314,172237,51986,52716,21196,923605,500727,242814,743541,1667146
Sep-2011,302377,31673,28520,65773,294280,184334,57518,59986,23250,1047711,549530,271829,821359,1869070
Oct-2011,309233,32592,29780,64410,303920,177228,54233,62316,22654,1056366,571050,281148,852198,1908564
Nov-2011,255190,27660,22323,55026,248523,160459,50142,46539,20801,886663,496523,244513,741036,1627699
Dec-2011,223670,24635,20335,46402,200429,152026,46496,37500,18361,769854,492259,232261,724520,1494374
Jan-2012,191594,21640,16033,41797,174113,128483,44872,29933,15292,663757,417900,206637,624537,1288294
Feb-2012,181334,22129,14853,43285,165370,136013,45044,28021,15099,651148,361810,175398,537208,1188356
Mar-2012,180834,22129,15754,46145,168530,134002,47841,29473,14460,659168,371567,180522,552089,1211257
Apr-2012,198155,21547,17646,48694,182706,138685,50810,30235,15334,703812,373200,195610,568810,1272622
May-2012,225183,23862,19254,48919,200701,144996,53819,34233,15701,766668,396741,220326,617067,1383735
Jun-2012,249569,27252,19228,59431,215602,163066,58014,33133,17543,842838,405093,222717,627810,1470648
Jul-2012,278021,32434,21763,66342,268334,196917,63425,45335,20533,993104,437444,261368,698812,1691916
Aug-2012,257313,31114,23992,65054,261518,177882,55608,51050,18769,942300,447029,274920,721949,1664249
Sep-2012,320360,34497,30381,76043,327222,200105,61208,59374,21001,1130191,559545,314502,874047,2004238
Oct-2012,306340,31733,25697,72843,309876,184394,55332,52900,20784,1059899,567584,310947,878531,1938430
Nov-2012,289262,28200,27529,66848,290733,166275,49281,52808,19896,990832,605629,294671,900300,1891132
Dec-2012,268765,27513,23786,60998,263675,160004,46538,46634,18144,916057,599107,264989,864096,1780153
Jan-2013,236865,25327,21871,54864,237134,156926,45740,35316,16317,830360,534842,237291,772133,1602493
Feb-2013,180255,20497,17746,47533,183200,130597,40045,28659,12534,661066,445925,189833,635758,1296824
Mar-2013,196406,18633,19663,44357,189204,127604,39378,29338,14338,678921,453578,192602,646180,1325101
Apr-2013,217956,18940,22241,43038,206701,130785,40570,34991,14405,729627,443323,193594,636917,1366544
May-2013,222980,19438,24042,42909,205750,139501,42665,35575,14175,747035,420917,216378,637295,1384330
Jun-2013,246078,21091,23499,47705,215215,149750,45524,34097,14721,797680,386532,232614,619146,1416826
Jul-2013,278357,27942,28042,58958,258771,179828,51550,43302,17135,943885,435938,295029,730967,1674852
Aug-2013,269148,29593,26635,62246,279040,168836,49906,48109,18807,952320,474667,308297,782964,1735284
Sep-2013,306037,27959,30455,68783,328385,178066,51157,59034,18943,1068819,515982,327374,843356,1912175
Oct-2013,315860,28183,28647,73200,330559,186232,52265,56126,21096,1092168,549063,331047,880110,1972278
Nov-2013,289997,23594,26214,64336,299234,169771,48326,47907,17745,987124,566102,307858,873960,1861084
Dec-2013,253845,22519,20714,54244,228895,166762,43117,32280,16644,839020,549590,280058,829648,1668668
Jan-2014,235748,23643,16682,50360,213875,153947,40061,28068,17365,779749,481341,247890,729231,1508980
Feb-2014,194587,21523,12963,41698,171071,133417,35560,22086,15064,647969,415250,212593,627843,1275812
Mar-2014,223268,22103,19670,46025,218266,152754,40482,30307,16643,769518,475640,251987,727627,1497145
Apr-2014, , , , , , , , , , , , , , 
May-2014, , , , , , , , , , , , , , 
Jun-2014, , , , , , , , , , , , , , 
Jul-2014, , , , , , , , , , , , , , 
Aug-2014, , , , , , , , , , , , , , 
Sep-2014, , , , , , , , , , , , , , 
Oct-2014, , , , , , , , , , , , , , 
Nov-2014, , , , , , , , , , , , , , 
Dec-2014, , , , , , , , , , , , , , 
'''