Location Set Covering Problem (LSCP) - Real World Facility Location Problem

geospatial

map

spatial optimization

drawing

Introduction

Location Set Covering Problem (LSCP) fall within a class of problems known as set covering problems. The set covering problem class is an NP-hard problem within the combinatorial optimization space. These problems typically aim at minimizing the number of sets within a space that cover all the demand within that space.

The Location Set Covering Problem is described by Toregas et al. (1971). There they show that emergency services must be placed according to a response time, since there is a allowable maximum service time for handling emergency activities. Therefore the LSCP model was proprosed whereby

the minimum number of facilities determined and locaten so that every demand area is covered within a predefined maximal service distance or time

(Church and Murray, 2018).

The general scenarios for this project can be stated as: store sites should supply the demand in this census tract considering the distance between the two sets of sites: demand points and candidate supply sites. Four fundamental facility location models are utilized to highlight varying outcomes dependent on objectives: LSCP (Location Set Covering Problem), MCLP (Maximal Coverage Location Problem), P-Median and P-Center.

Location Set Covering Problem (LSCP)

Location Set Covering is a problem realized by Toregas, et al. (1971). He figured out that emergency services must have placed according to a response time, since, there is a allowable maximum service time when it’s discussed how handle an emergency activity. Minimize the number of facilities needed and locate them so that every demand area is covered within a predefined maximal service distance or time. Since the model needs a distance cost matrix we should define some variables.

Maximal Coverage Location Problem (MCLP)

LSCP try to minimize the amount of facilities candidate sites in a maximum service standard but then arise another problem: the budget. Sometimes it requires many facilities sites to reach a complete coverage, and there are circumstances when the resources are not available and it’s plausible to know how much coverage we can reach using a exact number of facilities. MCLP class try to solve this problem:

Maximize the amount of demand covered within a maximal service distance or time standard by locating a fixed number of facilities self.

P-Center Problem

Hakimi (1964) introduced the absolute center problem to locate a police station or a hospital such that the maximum distance of the station to a set of communities connected by a highway system is minimized. Hakimi (1965) generalizes the absolute center problem to the p-center problem.

P-Median Problem

Hakimi (1965) proposed this model thinking about locating telephone switching centers. Then the objective was to minimize the total length of wire serving all customers while locating telephone witching centers, assuming that each customer would be connected by wire to the closest switching center. Locate a fixed number of facilities such that the resulting sum of travel distances is minimized

Dataset

Load dataset

For this project, we will solve mulitple real world facility location problems using a dataset describing an area of census tract 205, San Francisco.

The problem is composed by 4 dataset:

network_distance is the calculus of distance between facility candidate sites calculated by ArcGIS Network Analyst Extension
demand_points represents the demand points with some important features for the facility location problem like population
facility_points represents the stores that are candidate facility sites
study_area is the polygon of census tract 205.

network_distance = pd.read_csv(r"..\map\map_service_area\SF_network_distance_candidateStore_16_censusTract_205_new.csv")

demand_points = pd.read_csv(r"..\map_service_area\SF_demand_205_centroid_uniform_weight.csv", index_col=0)
demand_points_gdf = gpd.GeoDataFrame( demand_points,geometry=gpd.points_from_xy(demand_points.long, demand_points.lat),).sort_values(by=['NAME']).reset_index()

facility_points = pd.read_csv(r"..\map_service_area\SF_store_site_16_longlat.csv", index_col=0)
facility_points_gdf = gpd.GeoDataFrame( facility_points, geometry=gpd.points_from_xy(facility_points.long, facility_points.lat),).sort_values(by=['NAME']).reset_index()

study_area = gpd.read_file(r"..\map_service_area\ServiceAreas_4.shp").dissolve()

network distance

	distance	name	DestinationName	demand
0	671.573	Store_1	6.07505e+07	6540
1	1333.71	Store_1	6.07505e+07	3539
2	1656.19	Store_1	6.07504e+07	4436
3	1783.01	Store_1	6.07506e+07	231
4	1790.95	Store_1	6.07505e+07	7787

demand points

	index	OBJECTID	ID	NAME	STATE_NAME	AREA	POP2000	HOUSEHOLDS	HSE_UNITS	BUS_COUNT	long	lat	geometry
0	136	139	6075010100	6.07501e+07	California	0.29101	2879	1760	1944	538	-122.411	37.8054	POINT (-122.411302937 37.8053570610001)
1	195	198	6075010200	6.07501e+07	California	0.19945	4288	2767	3044	256	-122.422	37.8027	POINT (-122.42182845 37.8026681890001)
2	196	199	6075010300	6.07501e+07	California	0.1047	4092	2092	2327	78	-122.415	37.8013	POINT (-122.415180848 37.8012525410001)
3	191	194	6075010400	6.07501e+07	California	0.13005	4859	2630	2899	209	-122.408	37.8024	POINT (-122.407959434 37.8023538830001)
4	42	42	6075010500	6.07501e+07	California	0.26295	2217	1521	1843	925	-122.4	37.7987	POINT (-122.399951658 37.798735326)

facility points

	index	OBJECTID	NAME	long	lat	geometry
0	1	1	Store_1	-122.51	37.7724	POINT (-122.510018182 37.7723636370001)
1	8	8	Store_11	-122.434	37.6554	POINT (-122.433781818 37.6553636360001)
2	9	9	Store_12	-122.439	37.7192	POINT (-122.438981818 37.719236364)
3	10	10	Store_13	-122.44	37.7454	POINT (-122.440218181 37.7453818180001)
4	11	11	Store_14	-122.422	37.743	POINT (-122.421636364 37.7429636370001)

polygon of census tract 205

	geometry	FacilityID	Name	FromBreak	ToBreak	FacilityOI	NAME_1	Shape_Leng	Shape_Area
0	POLYGON ((-122.45299324702856 37.638982033270395, -122.4541513019999 37.638532816000065, ))	8	Store_11	0	5000	8	Store_11	0.287448	0.00333851

drawing

Data Preparation

Before running any analysis, we need to ensure that our data is well-structured and ready for processing. This involves key preprocessing steps that transform raw datasets into a usable format for modeling. Here’s how we do it:

creating a cost matrix, The first step is to pivot the network_distance dataframe into a cost matrix. This matrix represents the travel distance between different locations in our dataset, making it easier to analyze accessibility and connectivity.
at demand_points dataset, we need a way to quantify demand in each area. In this case, we use population data from the year 2000 as a proxy for demand weight.

Defining Key Parameters

To fine-tune our analysis, we set a few important parameters that guide facility placement decisions:

Service Distance (service_dist = 5000)
- Each facility can serve a maximum radius of 5,000 meters (or 5 km).
- Any demand point outside this radius will not be considered directly reachable.
- This ensures realistic coverage for service areas, balancing accessibility with efficiency.
Number of Facilities (p_facility = 4)
- We assume that only 4 facilities will be placed in this area.
- This constraint forces an optimization process—where to position these 4 facilities to maximize service coverage?

network distance matrics

DestinationName	Store_1	Store_11	Store_12	Store_13	Store_14	Store_15	Store_16	Store_17	Store_18	Store_19	Store_2	Store_3	Store_4	Store_5	Store_6	Store_7
6.07501e+07	11495.2	20022.7	10654.6	8232.54	7561.4	4139.77	4805.81	2055.53	225.609	1757.62	11522.5	7529.99	10847.2	10604.7	20970.3	15243
6.07501e+07	10436.2	19392.1	10024	7601.97	6930.83	3093.85	4175.23	1257.81	1041.91	2333.24	10509.1	6470.97	10216.7	9974.16	20339.7	14612.4
6.07501e+07	10746.3	19404.7	10036.6	7614.55	6943.41	3381.78	4187.81	2046.44	744.584	1685.52	10800.9	6778.89	10229.2	9986.74	20352.3	14625
6.07501e+07	11420.5	19808.4	10440.3	8018.24	7347.1	4044.47	4591.51	2463.74	795.715	1282.22	11308.2	7447.19	10632.9	10390.4	20756	15028.7
6.07501e+07	11379.4	19583.9	10215.8	7793.8	7122.65	4103.73	4367.06	3320.28	1731.46	249.67	11083.8	7379.54	10408.5	10166	20531.5	14804.2

The Code

By calling maps.spatial_cluster_opt, we can identify the optimal store locations to serve all of the demands.

result, model, fac2cli, fac2cli_idx = maps.spatial_cluster_opt(main_data=cost_matrix, types='lscp',threshold=service_dist)

This function requires the following parameters:

main_data (string): Data location and value
weights (list): list of value that we want to maximize
types (string): Type of optimization
threshold (int): the minimum number of point each region must have
p_facilities (int): total point (i.e. medical locations )
max_radius (int): maximum radius for each point

Then, by using maps.plot_cluster_opt we can plot the results to identify which stores were selected and what part of the population they are serving.

maps.plot_cluster_opt(model=result,facility_points=medical_center_locs,
                    demand_points=patient_locs,map_area=streets_gpd,p_facilities=4
                    fac2cli_idx=[],types ='pulp',title="LSCP - Cost Matrix Solution"
                    center_map=[])

This function requires the following parameters:

model (string): result of model
facility_points (dataframe): facility location data
demand_points (dataframe): demand or point location data
map_area (dataframe): area map data
fac2cli (dataframe): cost matrix
p_facilities (int): total point (i.e. medical locations )
fac2cli_idx (dataframe): cost matrix index
types (string): type of plot
title (string): title
center_map (list):

The result

Location Set Covering Problem (LSCP)

drawing drawing

Maximal Coverage Location Problem (MCLP)

MCLP Percentage Covered: 89.7560 drawing drawing

P-Center Problem

problem objective: 7403.06

drawing drawing

P-Median Problem

p-median mean weighted distance: 2982.1268

problem objective: 2848268129.7145

drawing drawing

The plot above clearly demonstrates the key features of each location model:

LSCP Overlapping clusters of clients associated with multiple facilities All clients covered by at least one facility.
MCLP Overlapping clusters of clients associated with multiple facilities Potential for complete coverage, but not guranteed.
p-median Tight and distinct clusters of client-facility relationships All clients allocated to exactly one facility
p-center Loose and overlapping clusters of client-facility relationships All clients allocated to exactly one facility