1

I need to make a function that can receive a number, search a data file (sunspots) for the number and return the correlating index as month and year.

The logic is the following: index 0 corresponds to Jan, 1749 and the last index to Dec, 1983. The months loop with increasing indices and then the year increments each time it hits the n*12th index.

My code so far is:

count = float(input("Enter a number of sunspots: "))
def get_year_and_month(count):
    x = sunspots
    if any(x == count):
        print(np.where(x == count))
get_year_and_month(count)

Which returns:

Enter a number of sunspots: 58
(array([   0,  516, 1591], dtype=int64),)

The expected output is Jan, 1749 for index 0, and then whatever the month and year would be for the 516th and 1591st index.

Is there a good way to change the indices to the format mentioned above?

buddemat
  • 4,552
  • 14
  • 29
  • 49
Myaanz
  • 33
  • 3
  • What is the expected output? – tsamridh86 Aug 17 '21 at 02:06
  • The expected output is: Jan, 1749, then whatever the month and year would be for the 516th and 1591st index. Taking into account that the months loop and then +1 to the year each time it hits the n12th index – Myaanz Aug 17 '21 at 02:11
  • Sorry If that is hard to understand. I am new to coding in general, and find it hard to express what I am trying to do without the full vocabulary – Myaanz Aug 17 '21 at 02:12

1 Answers1

0

Strings as index are not possible in NumPy arrays.

But, if I understand you correctly, you have no real need to change the index but just want to get the according month and year based on the index.

Calculating the correct month and date from the index is pretty straightforward. You can use the modulus and floor division functions for that.

Since the year increments every 12th index, you can calculate it as 1749 (the base year) plus the result of the floor division of the index by 12:

yr = idx // 12 + 1749

The month is the index modulo 12 plus one (since the index starts at zero):

mnth = idx % 12 + 1

Since you don't want the month as number, but instead the abbreviation as a string, you need to apply one of the different possibilities to get the month name from a number. I simply created an array to look up the month abbreviation as in this SO answer, since this can later be used efficiently for a lookup with np.take().

months = ['n/a', 'Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']

mnth_abbrev = months[idx % 12 + 1]

In order to apply this efficiently not only to a single index value, but simultaneously on all values of a numpy array, we can use vectorized versions of the above operations that are optimized for performance. These are np.mod() (for %) and np.floor_divide() (for //). Also, we use np.take() (for x[y], see this SO post). As some minor changes, I used np.flatnonzero() instead of np.where() to get the indices since we want an array as a result instead of a tuple (see this SO post for further information). Finally, I used np.char.add() to concatenate month and year and also changed the months array a bit to include a comma and a space after each month abbreviation to get your final desired output.

Full example:

import numpy as np

sunspots = np.array([58,12,4,0,58,1,548,45,0,0,2,58,58,1,12,2,0,58])

months = ['n/a', 'Jan, ', 'Feb, ', 'Mar, ', 'Apr, ', 'May, ', 'Jun, ', 'Jul, ', 'Aug, ', 'Sep, ', 'Oct, ', 'Nov, ', 'Dec, ']


def get_year_and_month(cnt, spots):
    hits = np.flatnonzero(spots == cnt)
    mths = np.take(months, np.mod(hits, 12) + 1)
    yrs = np.floor_divide(hits, 12) + 1749
    return np.char.add(mths, yrs.astype('str'))

count = int(input("Enter a number of sunspots: "))

res = get_year_and_month(count, sunspots)

if res.size != 0:
    print(np.vstack(res))

Output:

Enter a number of sunspots: 58
[['Jan, 1749']
 ['May, 1749']
 ['Dec, 1749']
 ['Jan, 1750']
 ['Jun, 1750']]

For your data, the result would be:

Enter a number of sunspots: 58
[['Jan, 1749']
 ['Jan, 1792']
 ['Aug, 1881']]

In case you are calling this function very often / repeatedly, an alternative is to construct the full date range as an array and look the indices up via that:

import numpy as np

sunspots = np.array([58,12,4,0,58,1,548,45,0,0,2,58,58,1,12,2,0,58])

months = ['n/a', 'Jan, ', 'Feb, ', 'Mar, ', 'Apr, ', 'May, ', 'Jun, ', 'Jul, ', 'Aug, ', 'Sep, ', 'Oct, ', 'Nov, ', 'Dec, ']

dates = np.arange(0,sunspots.shape[0])
dates = np.char.add(np.take(months, np.mod(dates, 12) + 1), (np.floor_divide(dates, 12) + 1749).astype('str'))

def get_year_and_month(cnt, spots, dts):
    hits = np.flatnonzero(spots == cnt)
    return np.take(dts, hits)

count = int(input("Enter a number of sunspots: "))

res = get_year_and_month(count, sunspots, dates)

if res.size != 0:
    print(np.vstack(res))

The output is identical to the one above.


In case you really do want a data structure with a string index, I suggest you use a Pandas DataFrame. You can e.g. simply build the dates array in the same way and pass it as the index parameter of the DataFrame() call. Then find your results with df.index:

import numpy as np
import pandas as pd

sunspots = np.array([58,12,4,0,58,1,548,45,0,0,2,58,58,1,12,2,0,58])

months = ['n/a', 'Jan, ', 'Feb, ', 'Mar, ', 'Apr, ', 'May, ', 'Jun, ', 'Jul, ', 'Aug, ', 'Sep, ', 'Oct, ', 'Nov, ', 'Dec, ']

df = pd.DataFrame(data=sunspots, index=np.char.add(np.take(months, np.mod(np.arange(sunspots.shape[0]), 12) + 1), (np.floor_divide(np.arange(sunspots.shape[0]), 12) + 1749).astype('str')), columns=['spots'])

count = int(input("Enter a number of sunspots: "))

print(df.index[df['spots'] == count].tolist())

Output:

Enter a number of sunspots: 58
['Jan, 1749', 'May, 1749', 'Dec, 1749', 'Jan, 1750', 'Jun, 1750']

Since the dataframe now really has the date string as an index, you can also access your sunspot data via this string index:

>>> df
           spots
Jan, 1749     58
Feb, 1749     12
Mar, 1749      4
Apr, 1749      0
May, 1749     58
Jun, 1749      1
Jul, 1749    548
Aug, 1749     45
Sep, 1749      0
Oct, 1749      0
Nov, 1749      2
Dec, 1749     58
Jan, 1750     58
Feb, 1750      1
Mar, 1750     12
Apr, 1750      2
May, 1750      0
Jun, 1750     58

>>> df['spots']['Jan, 1750']
58
buddemat
  • 4,552
  • 14
  • 29
  • 49