It's straight-forward how to find one column name:
SELECT utc.table_name FROM user_tab_columns utc WHERE utc.column_name = 'FUND_ID' ORDER BY utc.table_name; TABLE_NAME ------------------------------ ANNUITY_ALLOCATION ANNUITY_FEE ASSIGNMENT BENCHMARK_PROCESSING BENCHMARK_WEIGHT CATEGORY_LINK COLLECTION_LINK COMPOSITE_LINK DAILY_ASSET DAILY_PROFILE DAILY_PROFILE_EXTRA ...If I need to find two such column names, I suppose I could do a self-join:
SELECT t1.table_name, t1.column_name, t2.column_name FROM user_tab_columns t1 INNER JOIN user_tab_columns t2 ON t1.table_name = t2.table_name WHERE t1.column_name = 'FUND_ID' AND t2.column_name = 'SECURITY_ID' ORDER BY t1.table_name; TABLE_NAME COLUMN_NAME COLUMN_NAME ------------------------------ -------------------- -------------------- EXCEPTION_STORE FUND_ID SECURITY_ID SECURITY FUND_ID SECURITY_ID V_DATA_EXCEPTION FUND_ID SECURITY_ID V_HOLDING_RATE FUND_ID SECURITY_ID V_HOLDING_REPORT FUND_ID SECURITY_ID V_TRANS_REPORT FUND_ID SECURITY_IDBut how I do extend this to 3, 4 or 5 columns? Simply adding self-joins doesn't seem to scale particularly well.
What I really need is to obtain the set of all column names as a single column per table name, then somehow cross-reference those against a set of desired column names.
Fortunately, Oracle supports the concept of a multiset. This operator allows me to quickly perform set operations between two sets.
To do this, I first need a schema-level collection type:
CREATE OR REPLACE TYPE varchar2_nt IS TABLE OF VARCHAR2(4000); /4,000 bytes of course being the maximum allowable string size in SQL.
Now, how do I get the set of all column names for each table? Like this:
SELECT ut.table_name ,(SELECT CAST( MULTISET(SELECT utc.column_name FROM user_tab_columns utc WHERE utc.table_name = ut.table_name) AS varchar2_nt ) FROM DUAL) column_list FROM user_tables ut ORDER BY ut.table_name; TABLE_NAME COLUMN_LIST ------------------------------ ---------------------------------------- ACCOUNT VARCHAR2_NT('IS_NONAGS', 'ACCOUNT_ID', ' EXTERNAL_ID', 'NAME', 'DESCRIPTION', 'AC COUNT_CATEGORY', 'RECONCILIATION_RULE', 'PERIOD_BASED', 'FORCE_SHARE_CLASS') ACCOUNT_GROUP VARCHAR2_NT('ACCOUNT_GROUP_ID', 'GROUP_N AME') ACCOUNT_GROUP_LINK VARCHAR2_NT('ACCOUNT_GROUP_ID', 'ACCOUNT _ID', 'SORT_ORDER')The input to MULTISET in this situation is a query. If the query returns no rows, you get an empty collection. The subquery-in-a-column is called a scalar subquery. It must return precisely one value (or NULL). The collection of column names is that one value!
Now that I have a collection of column names per table, how can I scan it for one, two or more column names all at once?
I can perform the intersection of two such collections. The constructor for my schema-level type can be used to provide a hard-coded list, for example:
varchar2_nt('FUND_ID', 'SECURITY_ID', 'END_DATE')Now I can do a MULTISET INTERSECT on this set and each column name set:
SELECT ut.table_name ,(SELECT CAST( MULTISET(SELECT utc.column_name FROM user_tab_columns utc WHERE utc.table_name = ut.table_name) AS varchar2_nt ) FROM DUAL) MULTISET INTERSECT varchar2_nt( 'FUND_ID' ,'SECURITY_ID' ,'END_DATE' ) column_list FROM user_tables ut ORDER BY ut.table_name; TABLE_NAME COLUMN_LIST ------------------------------ ---------------------------------------- ACCOUNT VARCHAR2_NT() ACCOUNT_GROUP VARCHAR2_NT() ACCOUNT_GROUP_LINK VARCHAR2_NT() ACCOUNT_MAP VARCHAR2_NT() ACCOUNT_SUMMARY VARCHAR2_NT() ACCOUNT_VECTORS VARCHAR2_NT() ACCOUNT_VIEW VARCHAR2_NT() ALERT VARCHAR2_NT() ALERT_USER VARCHAR2_NT() ANNUITY_ALLOCATION VARCHAR2_NT('FUND_ID') ANNUITY_FEE VARCHAR2_NT('FUND_ID')This unfortunately gives me an empty collection for those tables without any of these columns, and partial results for 1, 2, or 3 columns. I need some way to discern the fully populated collections from the rest.
For this, I can use the CARDINALITY function. I want only those tables that have all three columns:
SELECT * FROM (SELECT ut.table_name ,(SELECT CARDINALITY( CAST( MULTISET(SELECT utc.column_name FROM user_tab_columns utc WHERE utc.table_name = ut.table_name) AS varchar2_nt ) MULTISET INTERSECT varchar2_nt( 'PORTFOLIO_ID' ,'SECURITY_ID' ,'END_DATE' ) ) FROM DUAL) num_columns FROM user_tables ut) t WHERE t.num_columns = 3 ORDER BY t.table_name; TABLE_NAME NUM_COLUMNS ------------------------------ ----------- SECURITY_ENHANCEMENT 3If I were doing this in a PL/SQL block, I'd avoid the hard-coded constant by defining the collection once:
DECLARE arr_collist varchar2_nt := varchar2_nt( 'PORTFOLIO_ID' ,'SECURITY_ID' ,'END_DATE' ); BEGIN FOR rec IN (SELECT * FROM (SELECT ut.table_name ,(SELECT CARDINALITY( CAST( MULTISET( SELECT utc.column_name FROM user_tab_columns utc WHERE utc.table_name = ut.table_name ) AS varchar2_nt ) MULTISET INTERSECT arr_collist ) FROM DUAL) num_columns FROM user_tables ut) t WHERE t.num_columns = CARDINALITY(arr_collist) ORDER BY t.table_name) LOOP DBMS_OUTPUT.put_line(rec.table_name); END LOOP; END; / SECURITY_ENHANCEMENTNow suppose I want to see which tables share the most number of common column names. I can take the intersection of the self-join of all column lists and compare each table's column list to one another; then sort by the number of common column names.
SELECT table_name1 ,table_name2 ,(column_list1 MULTISET INTERSECT column_list2) column_list3 ,CARDINALITY(column_list1 MULTISET INTERSECT column_list2) card FROM (SELECT ut1.table_name table_name1 ,(SELECT CAST( MULTISET(SELECT utc.column_name FROM user_tab_columns utc WHERE utc.table_name = ut1.table_name) AS varchar2_nt ) FROM DUAL) column_list1 ,t2.table_name2 ,t2.column_list2 FROM user_tables ut1 INNER JOIN (SELECT ut2.table_name table_name2 ,(SELECT CAST( MULTISET( SELECT utc.column_name FROM user_tab_columns utc WHERE utc.table_name = ut2.table_name ) AS varchar2_nt ) FROM DUAL) column_list2 FROM user_tables ut2) t2 ON ut1.table_name != t2.table_name2) WHERE CARDINALITY((column_list1 MULTISET INTERSECT column_list2)) > 0 ORDER BY card DESC; TABLE_NAME1 TABLE_NAME2 ------------------------------ ------------------------------ COLUMN_LIST3 -------------------------------------------------------------------------------- CARD ---------- MONTHLY_RETURN MONTHLY_RETURN_MATRIX VARCHAR2_NT('FUND_ID', 'PROFILE_DATE', 'ROR_1M_C_NOLOAD', 'ROR_1M_G_NOLOAD', 'RO R_2M_C_NOLOAD', 'ROR_3M_C_NOLOAD', 'ROR_4M_C_NOLOAD', 'ROR_5M_C_NOLOAD', 'ROR_6M _C_NOLOAD', 'ROR_7M_C_NOLOAD', 'ROR_8M_C_NOLOAD', 'ROR_9M_C_NOLOAD', 'ROR_10M_C_ NOLOAD', 'ROR_11M_C_NOLOAD', 'ROR_YTD_C_NOLOAD', 'ROR_YTD_G_NOLOAD', 'ROR_1Y_C_N OLOAD', 'ROR_2Y_A_NOLOAD', 'ROR_2Y_C_NOLOAD', 'ROR_3Y_A_NOLOAD', 'ROR_3Y_C_NOLOA D', 'ROR_4Y_A_NOLOAD', 'ROR_4Y_C_NOLOAD', 'ROR_5Y_A_NOLOAD', 'ROR_5Y_C_NOLOAD', 'ROR_6Y_A_NOLOAD', 'ROR_7Y_A_NOLOAD', 'ROR_8Y_A_NOLOAD', 'ROR_9Y_A_NOLOAD', 'ROR _10Y_A_NOLOAD', 'ROR_10Y_C_NOLOAD', 'ROR_15Y_C_NOLOAD', 'ROR_15Y_A_NOLOAD', 'ROR _20Y_C_NOLOAD', 'ROR_20Y_A_NOLOAD', 'ROR_25Y_C_NOLOAD', 'ROR_25Y_A_NOLOAD', 'ROR _INCP_C_NOLOAD', 'ROR_INCP_A_NOLOAD', 'ROR_ADV_INCP_C_NOLOAD', 'ROR_ADV_INCP_A_N OLOAD', 'ROR_USER_INCP_C_NOLOAD', 'ROR_USER_INCP_A_NOLOAD', 'ROR_PERF_INCP_C_NOL OAD', 'ROR_PERF_INCP_A_NOLOAD', 'ROR_1M_C_LOAD', 'ROR_2M_C_LOAD', 'ROR_3M_C_LOAD ', 'ROR_4M_C_LOAD', 'ROR_5M_C_LOAD', 'ROR_6M_C_LOAD', 'ROR_7M_C_LOAD', 'ROR_8M_C _LOAD', 'ROR_9M_C_LOAD', 'ROR_10M_C_LOAD', 'ROR_11M_C_LOAD', 'ROR_YTD_C_LOAD', ' ROR_1Y_C_LOAD', 'ROR_2Y_A_LOAD', 'ROR_2Y_C_LOAD', 'ROR_3Y_A_LOAD', 'ROR_3Y_C_LOA D', 'ROR_4Y_A_LOAD', 'ROR_4Y_C_LOAD', 'ROR_5Y_A_LOAD', 'ROR_5Y_C_LOAD', 'ROR_6Y_ A_LOAD', 'ROR_7Y_A_LOAD', 'ROR_8Y_A_LOAD', 'ROR_9Y_A_LOAD', 'ROR_10Y_A_LOAD', 'R OR_10Y_C_LOAD', 'ROR_15Y_C_LOAD', 'ROR_15Y_A_LOAD', 'ROR_20Y_C_LOAD', 'ROR_20Y_A _LOAD', 'ROR_25Y_C_LOAD', 'ROR_25Y_A_LOAD', 'ROR_INCP_C_LOAD', 'ROR_INCP_A_LOAD' , 'ROR_ADV_INCP_C_LOAD', 'ROR_ADV_INCP_A_LOAD', 'ROR_USER_INCP_C_LOAD', 'ROR_USE R_INCP_A_LOAD', 'ROR_PERF_INCP_C_LOAD', 'ROR_PERF_INCP_A_LOAD', 'RECORD_GUID', ' CREATORDB_ID', 'REVISION_NBR', 'GENERATION_ID', 'REFERENCE_COUNT', 'SYNC_SESSION _ID') 92I do get the same pair of tables listed twice, first as table_name1-table_name2 and then as table_name2-table_name1.
I can also use the MULTISET and related operators and functions directly in assignment statements in PL/SQL. For example:
DECLARE x varchar2_nt := varchar2_nt(); BEGIN x := varchar2_nt( 'a' ,'b' ,'b' ,'c' ) MULTISET INTERSECT DISTINCT varchar2_nt( 'b' ,'c' ,'c' ,'d' ); IF (x.COUNT > 0) THEN DBMS_OUTPUT.put_line('There are ' || x.COUNT || ' common elements:'); FOR i IN x.FIRST .. x.LAST LOOP DBMS_OUTPUT.put_line(x(i)); END LOOP; ELSE DBMS_OUTPUT.put_line('There are no common elements'); END IF; END; /There are 2 common elements:
b c
If you play around with this, you'll see the intersection of a non-empty set and an empty set is an empty set.
In addition to INTERSECT [DISTINCT], there's UNION [DISTINCT] and EXCEPT [DISTINCT].
No comments:
Post a Comment