Define Date Range in an API's response

Hi all

I have some server data from an API call that I’d like to print and export them as excel for further analysis. I want to define a date range to isolate for the outcome.

I try this

It doesn’t work though. I get 000 values. Any idea of what could be wrong? If I set just one day in the date_range eg ‘2020-05-01’. It works and I get values for the specific date. But not with the range thing.

I don’t understand the error. Please, help

I actually managed to correct this error somehow. Now it runs for the date range I define but at the end of the script running I get the error message " IndexError: index 4 is out of bounds for axis 0 with size 4"

What this error message could mean?

Thanks

In Python something of size 4 has indices 0, 1, 2, 3. I would guess that
you are counting one too far.

Cheers,
Cameron Simpson cs@cskk.id.au

So, in essence, that error does not prohibit code from running to the end? I use 4 days as a date range to retrieve data. If this index of 0 counts the first day then none of the days is missing. So, why I get the error? How can I prevent this from happening?

Thanks

So, in essence, that error does not prohibit code from running to the
end?

Well, it raises an exception, which will terminate your programme. You
should figure out what you’re doing wrong, and adjust your code so that
that does not happen.

I use 4 days as a date range to retrieve data. If this index of 0
counts the first day then none of the days is missing. So, why I get
the error? How can I prevent this from happening?

I’d need to see the code (inline please, not a screenshot or on a site
like pastebin).

Normally you just need to do the right thing, whatever that is. For
example (no using pandas):

L = [5, 6, 7, 8, 9, 10, 11]
L2 = L[0:4]

The list L2 has the first 4 elements of L: 5, 6, 7, 8.
The expression “L2[3]” gives 8, and “L2[4]” will raise an index error
because there is no element with index 4 (that would require 5 list
elements, since we count from 0).

If I were writing a while-loop to print out the elements of L2 I might
write:

i = 0   # starting element
while i < len(L2):
    print(L2[i])
    i = i + 1

Notice that I’m running up to “< len(L2)”, not “<= len(L2)”. len(L2)
will be 4, but the indices themselves stop just before that.

Bear in mind that another possibility is that although you have asked
for 4 days, there might be fewer days than that. Can you print the len()
of the array you get after selecting your 4-day range? Check that the
len() is actually 4?

Cheers,
Cameron Simpson cs@cskk.id.au

Hi again and thank you for your contribution so far.

Yes, indeed my daterange is 4 after len()

My code is this:


for apostalcode in pcList:

      listofdata=[]

      for day in daterange:

          date = day.strftime('%Y-%m-%d') # yields a string

          datelist=[]

          for i in range(0,25):

              temp=pdData[int(apostalcode)][date][i]

What I can infer is that since temp line is under the for in range(0,25) loop, it searches for data for 25 times due to the index i in its end. The index [date] in the middle, however, has only 4 values. Therefore, it might yield an error. What I think I should do, is that I need to create a list of the four days in the previous for loop and then append it to the subsequent one with the 25 values.

What do you think?

Yes, indeed my daterange is 4 after len()

Ok, so you have the expected dates. Wwell, I hope they’re the expected
dates :slight_smile:

for apostalcode in pcList:
      listofdata=[]
      for day in daterange:

Where does daterange come from?

          date = day.strftime('%Y-%m-%d') # yields a string
          datelist=[]
          for i in range(0,25):
              temp=pdData[int(apostalcode)][date][i]

What I can infer is that since temp line is under the for in
range(0,25) loop, it searches for data for 25 times due to the index i
in its end.

Yes, once per loop iteration, for i=0 through i=24 inclusive.

The index [date] in the middle, however, has only 4 values.

That’s expected isn’t it?

Therefore, it might yield an error. What I think I should do, is that I
need to create a list of the four days in the previous for loop and
then append it to the subsequent one with the 25 values.

If “the previous for loop” means the “for day in daterange” I’d tend to
refer to it as the “outer” loop - the loops are inside each other.

So provided daterange has the exact dates you’re looking for there’s no
issue here.

However, it would be helpful to see exactly what keys you’re using
(values of “date”) and exactly what keys are in the array
“pdData[int(apostalcode)”. Since that’s what you’re looking up.

Also, is pdData an (at least) 3-dimensional array? Meaning it has a
postal code index then a date index then a numeric index?

Supposing you recast your loops like this:

print("pcList =", repr(pcList))
for apostalcode in pcList:
      print("apostalcode =", repr(apostalcode))
      pdData_pcode = pdData[int(apostalcode)]
      print("pdData[", int(apostalcode), "] keys =", list(pdData.keys()))
      listofdata=[]
      for day in daterange:
          print("day =", type(day), repr(day))
          date = day.strftime('%Y-%m-%d') # yields a string
          print("date =", type(date), repr(date))
          pdData_pcode_day = pdData_pcode[date]
          print("pdData_pcode[", date, "] =", repr(pdData_pcode_day))
          datelist=[]
          for i in range(0,25):
              print("i =", repr(i))
              temp=pdData_pcode_day[i]

This will produce a lot of output, but should show you exactly what
values are in use up to the point where you get an exception. It should
also show you what date keys are actaully in the pdData before you start
accessing it by date. Maybe they are not what you expect.

Cheers,
Cameron Simpson cs@cskk.id.au

1 Like

Well done mr Cameron. Very fruitful feedback. I think I can learn a lot out of this script. Please, take a look its response. I post the final postal code that is working prior the new postal code begins with the error

apostalcode = '43813'
pdData[ 43813 ] keys = [43001, 43002, 43412, 43812, 43428, 43425, 43558, 43813, 43154, 43550, 43320, 43527, 43590, 43510, 43500, 43427, 43810, 43595, 43155, 43365, 43720, 43879, 43736, 43515, 43782, 43780, 43774, 43491, 43890, 43151, 43762, 43540, 43700, 43312, 43481, 43206, 43448, 43877, 43350, 43816, 43760, 43202, 43513, 43120, 43514, 43894, 43003, 43004, 43005, 43130, 43008, 43110, 43100, 43006, 43201, 43376, 43746, 43784, 43142, 43792, 43737, 43440, 43141, 43422, 43203, 43204, 43205, 43390, 43880, 43140, 43814, 43747, 43364, 43887, 43007, 43718, 43763, 43362, 43811, 43886, 43370, 43870, 43749, 43373, 43840, 43717, 43772, 43426, 43765, 43423, 43400, 43764, 43881, 43740, 43817, 43711, 43363, 43878, 43375, 43830, 43713, 43310, 43421, 43490, 43580, 43380, 43815, 43593, 43374, 43516, 43392, 43820, 43739, 43730, 43371, 43391, 43750, 43773, 43300, 43470, 43785, 43411, 43330, 43710, 43530, 43712, 43860, 43775, 43896, 43591, 43776, 43361, 43479, 43381, 43559, 43560, 43449, 43382, 43439, 43893, 43719, 43714, 43311, 43883, 43885, 43178, 43529, 43528, 43597, 43461, 43549, 43596, 43882, 43787, 43786, 43783, 43790, 43738, 43340, 43143, 43761, 43791, 43884, 43895, 43420, 43781, 43511, 43360, 43459, 43415, 43410, 43777, 43372, 43424, 43153, 43512, 43892, 43379, 43413, 43152, 43429, 43460, 43771, 43891, 43393, 43592, 43594, 43839, 43715, 43800, 43850, 43321, 43519, 43430, 43748, 43144, 43770, 43450, 43548, 43480, 43570, 43716, 43150, 43520, 43569, 43517, 43897, 43414]
day = <class 'pandas._libs.tslibs.timestamps.Timestamp'> Timestamp('2018-05-01 00:00:00', freq='D')
date = <class 'str'> '2018-05-01'
pdData_pcode[ 2018-05-01 ] = sumContracts    437
1               213
2               204
3               185
4               177
5               176
6               174
7               180
8               172
9               210
10              232
11              246
12              265
13              274
14              283
15              264
16              261
17              243
18              237
19              238
20              252
21              264
22              304
23              272
24              246
Name: 2018-05-01, dtype: object
i = 0
i = 1
i = 2
i = 3
i = 4
i = 5
i = 6
i = 7
i = 8
i = 9
i = 10
i = 11
i = 12
i = 13
i = 14
i = 15
i = 16
i = 17
i = 18
i = 19
i = 20
i = 21
i = 22
i = 23
i = 24
day = <class 'pandas._libs.tslibs.timestamps.Timestamp'> Timestamp('2018-05-02 00:00:00', freq='D')
date = <class 'str'> '2018-05-02'
pdData_pcode[ 2018-05-02 ] = sumContracts    437
1               209
2               193
3               174
4               180
5               178
6               172
7               188
8               227
9               267
10              267
11              290
12              321
13              369
14              354
15              296
16              267
17              314
18              313
19              303
20              273
21              285
22              313
23              279
24              218
Name: 2018-05-02, dtype: object
i = 0
i = 1
i = 2
i = 3
i = 4
i = 5
i = 6
i = 7
i = 8
i = 9
i = 10
i = 11
i = 12
i = 13
i = 14
i = 15
i = 16
i = 17
i = 18
i = 19
i = 20
i = 21
i = 22
i = 23
i = 24
day = <class 'pandas._libs.tslibs.timestamps.Timestamp'> Timestamp('2018-05-03 00:00:00', freq='D')
date = <class 'str'> '2018-05-03'
pdData_pcode[ 2018-05-03 ] = sumContracts    438
1               190
2               177
3               168
4               169
5               159
6               158
7               186
8               219
9               269
10              292
11              361
12              354
13              376
14              334
15              293
16              264
17              296
18              294
19              312
20              319
21              291
22              325
23              277
24              235
Name: 2018-05-03, dtype: object
i = 0
i = 1
i = 2
i = 3
i = 4
i = 5
i = 6
i = 7
i = 8
i = 9
i = 10
i = 11
i = 12
i = 13
i = 14
i = 15
i = 16
i = 17
i = 18
i = 19
i = 20
i = 21
i = 22
i = 23
i = 24
day = <class 'pandas._libs.tslibs.timestamps.Timestamp'> Timestamp('2018-05-04 00:00:00', freq='D')
date = <class 'str'> '2018-05-04'
pdData_pcode[ 2018-05-04 ] = sumContracts    437
1               187
2               166
3               171
4               176
5               166
6               161
7               185
8               219
9               266
10              277
11              287
12              369
13              360
14              338
15              285
16              288
17              324
18              319
19              303
20              310
21              296
22              319
23              269
24              249
Name: 2018-05-04, dtype: object
i = 0
i = 1
i = 2
i = 3
i = 4
i = 5
i = 6
i = 7
i = 8
i = 9
i = 10
i = 11
i = 12
i = 13
i = 14
i = 15
i = 16
i = 17
i = 18
i = 19
i = 20
i = 21
i = 22
i = 23
i = 24
apostalcode = '43154'
pdData[ 43154 ] keys = [43001, 43002, 43412, 43812, 43428, 43425, 43558, 43813, 43154, 43550, 43320, 43527, 43590, 43510, 43500, 43427, 43810, 43595, 43155, 43365, 43720, 43879, 43736, 43515, 43782, 43780, 43774, 43491, 43890, 43151, 43762, 43540, 43700, 43312, 43481, 43206, 43448, 43877, 43350, 43816, 43760, 43202, 43513, 43120, 43514, 43894, 43003, 43004, 43005, 43130, 43008, 43110, 43100, 43006, 43201, 43376, 43746, 43784, 43142, 43792, 43737, 43440, 43141, 43422, 43203, 43204, 43205, 43390, 43880, 43140, 43814, 43747, 43364, 43887, 43007, 43718, 43763, 43362, 43811, 43886, 43370, 43870, 43749, 43373, 43840, 43717, 43772, 43426, 43765, 43423, 43400, 43764, 43881, 43740, 43817, 43711, 43363, 43878, 43375, 43830, 43713, 43310, 43421, 43490, 43580, 43380, 43815, 43593, 43374, 43516, 43392, 43820, 43739, 43730, 43371, 43391, 43750, 43773, 43300, 43470, 43785, 43411, 43330, 43710, 43530, 43712, 43860, 43775, 43896, 43591, 43776, 43361, 43479, 43381, 43559, 43560, 43449, 43382, 43439, 43893, 43719, 43714, 43311, 43883, 43885, 43178, 43529, 43528, 43597, 43461, 43549, 43596, 43882, 43787, 43786, 43783, 43790, 43738, 43340, 43143, 43761, 43791, 43884, 43895, 43420, 43781, 43511, 43360, 43459, 43415, 43410, 43777, 43372, 43424, 43153, 43512, 43892, 43379, 43413, 43152, 43429, 43460, 43771, 43891, 43393, 43592, 43594, 43839, 43715, 43800, 43850, 43321, 43519, 43430, 43748, 43144, 43770, 43450, 43548, 43480, 43570, 43716, 43150, 43520, 43569, 43517, 43897, 43414]
day = <class 'pandas._libs.tslibs.timestamps.Timestamp'> Timestamp('2018-05-01 00:00:00', freq='D')
date = <class 'str'> '2018-05-01'
pdData_pcode[ 2018-05-01 ] = sumContracts                                                              0
<generator object outputToJson.<locals>.<genexpr> at 0x7fb3e22e7850>      0
<generator object outputToJson.<locals>.<genexpr> at 0x7fb3e22e7bd0>    NaN
<generator object outputToJson.<locals>.<genexpr> at 0x7fb3e22e7c50>    NaN
Name: 2018-05-01, dtype: object
i = 0
i = 1
i = 2
i = 3
i = 4
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-41-11d623d89ecf> in <module>()
     16           for i in range(0,25):
     17               print("i =", repr(i))
---> 18               temp=pdData_pcode_day[i]

/usr/local/lib/python3.7/dist-packages/pandas/core/series.py in __getitem__(self, key)
    877 
    878         if is_integer(key) and self.index._should_fallback_to_positional():
--> 879             return self._values[key]
    880 
    881         elif key_is_scalar:

IndexError: index 4 is out of bounds for axis 0 with size 4

As you are able to see the last good working postal code is for 43813 for the values we want out of it for the 4 days. I guess the keys variable underneath are the subsequent postal codes associated with that key or same province. Date is a timestamp from datetime library. Then we have the hourly (24 hours) values. Then the index i for subsequent 24 indices. I think we created this index to name the columns of a csv file. Then the same pattern continues for the subsequent days up until the 04/05/2018.

Then the new postal code begins and there is the error as I get those lines

<generator object outputToJson.<locals>.<genexpr> at 0x7fb3e22e7850>      0
<generator object outputToJson.<locals>.<genexpr> at 0x7fb3e22e7bd0>    NaN
<generator object outputToJson.<locals>.<genexpr> at 0x7fb3e22e7c50>    NaN

I don’t know what they mean. Could you please help on that? Thanks in advance

Further to say that after those error lines I appended at my previous comment. I get only 0-4 i indices. Where those should be 0-24.

An additional info is that the line the error is encountered is the last one temp. Does this mean that there are no complete day values for some postal codes? If so, how can I prevent this from happening? Should I skip it somehow, or search why there are incomplete data for those days?

  for i in range(0,25):
              print("i =", repr(i))
              temp=pdData_pcode_day[i]  <-------

  if is_integer(key) and self.index._should_fallback_to_positional():
--> 879             return self._values[key]