Skip to content

How to troubleshoot a bad model fit? #51

@Javier-Acuna

Description

@Javier-Acuna

Hello,

I tried to use your library with some measurement data but the model obtained is not a good fit.

x and y values are already normalized, I already tried diverse partitioning of the test/train sets and changing the max_iter value to 50.000 but after that I don't know what else to do. Is the number of data points too small? Does ffx not handle non-deterministic (with some random noise) models? Any help would be very much appreciated

Here is a sample code:

import ffx
import numpy as np
import matplotlib.pyplot as plt

# Data to model, two measurements of 'y' for each 'x' value
x = np.array([[-1.392], [-0.985], [-0.308], [0.293], [1.046], [1.347], [-1.392], [-0.985], [-0.308], [ 0.293], [ 1.046], [ 1.347]])
y = np.array([[-1.691], [-0.925], [ 0.109], [0.768], [0.826], [0.829], [-1.673], [-1.049], [ 0.123], [ 0.833], [ 0.947], [ 0.903]])

# Plot y vs x
fig, ax = plt.subplots(1)
ax.scatter(x, y, facecolor='b', marker='o')

# Separate in train and test sets, two possibilities:
if( False ): # Alternate values of 'x' in train set
    x_train = x[1:12:2].reshape( (6,1) )
    y_train = y[1:12:2].reshape( (6,1) )
    
    x_test = x[0:12:2].reshape( (6,1) )
    y_test = y[0:12:2].reshape( (6,1) )
else: # Each 'x' value in train set
    x_train = x[0:6].reshape( (6,1) )
    y_train = y[0:6].reshape( (6,1) )
    
    x_test = x[6:12].reshape( (6,1) )
    y_test = y[6:12].reshape( (6,1) )

#Plot train/tests sets
fig, ax = plt.subplots(1)
ax.scatter(x_train, y_train, facecolor='b', marker='o', label='train')
ax.scatter(x_test,  y_test,  facecolor='b', marker='x', label='test')
ax.legend()

# max_iter changed to 50000  in model_factories.py
models = ffx.run(x_train, y_train, x_test, y_test, varnames=['x'])

for model in models:
    yhat = model.simulate(x_test)
    print(model)

    fig, ax = plt.subplots(1)
    ax.scatter(x, y, facecolor='b', marker='o', label='measurement')
    ax.scatter(x_test, yhat, facecolor='r', marker='x', label='model')
    ax.legend()

The models I obtain with the first partition are:
0.227
0.187 + 0.179*x
Figure_Github_ffx_Partition_1

and with the second partition are:
-0.0140
0.0116 / (1.0 - 0.150*abs(x))
Figure_Github_ffx_Partition_2

Do you have any ideas what should I try? Any help would be much appreciated

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions